I am working on sentiment analysis on a Twitter dataset and for some reason, my variable negative_1 and negative_2 show a different result.
The following is my code and screenshot
b <- bt_sentiment_test %>% mutate(running_avg = cumsum(sentiment)/row_number(), positive = cumsum(ifelse(sentiment>0, sentiment,0))/row_number(), negative_1 = cumsum(abs(ifelse(sentiment<0, sentiment,0)))/row_number(), negative_2 = cumsum(abs(ifelse(sentiment<0, sentiment,0))/row_number()))
so what I'm trying to do is to cumsum the column of the negative value from the sentiment column and then divide by the row number, but I somehow get a different result in negative_1 and negative_2.
I tested it out with a simple dataset, a_2 and a_3 result are exactly the same.
a <- data.frame(x = -(1:10)) a %>% mutate(a_1 = cumsum(x), a_2 = abs(cumsum(x)/row_number()), a_3= abs(cumsum(x))/row_number())
Why is there a difference in my b dataset?
Any suggestion or advice would be appreciated.