I am working on sentiment analysis on a Twitter dataset and for some reason, my variable negative_1 and negative_2 show a different result.
The following is my code and screenshot
b <- bt_sentiment_test %>%
mutate(running_avg = cumsum(sentiment)/row_number(),
positive = cumsum(ifelse(sentiment>0, sentiment,0))/row_number(),
negative_1 = cumsum(abs(ifelse(sentiment<0, sentiment,0)))/row_number(),
negative_2 = cumsum(abs(ifelse(sentiment<0, sentiment,0))/row_number()))
so what I'm trying to do is to cumsum the column of the negative value from the sentiment column and then divide by the row number, but I somehow get a different result in negative_1 and negative_2.
I tested it out with a simple dataset, a_2 and a_3 result are exactly the same.
a <- data.frame(x = -(1:10))
a %>% mutate(a_1 = cumsum(x),
a_2 = abs(cumsum(x)/row_number()),
a_3= abs(cumsum(x))/row_number())
Why is there a difference in my b dataset?
Any suggestion or advice would be appreciated.
Thanks!
Andrew