Question regarding the abs() function

I am working on sentiment analysis on a Twitter dataset and for some reason, my variable negative_1 and negative_2 show a different result.
The following is my code and screenshot

b <- bt_sentiment_test %>%
  mutate(running_avg = cumsum(sentiment)/row_number(),
         positive = cumsum(ifelse(sentiment>0, sentiment,0))/row_number(),
         negative_1 = cumsum(abs(ifelse(sentiment<0, sentiment,0)))/row_number(),
         negative_2 = cumsum(abs(ifelse(sentiment<0, sentiment,0))/row_number()))

Screen Shot 2020-11-24 at 2.05.23 PM

so what I'm trying to do is to cumsum the column of the negative value from the sentiment column and then divide by the row number, but I somehow get a different result in negative_1 and negative_2.

I tested it out with a simple dataset, a_2 and a_3 result are exactly the same.

a <- data.frame(x = -(1:10))
a %>% mutate(a_1 = cumsum(x),
             a_2 = abs(cumsum(x)/row_number()),
             a_3= abs(cumsum(x))/row_number())

Screen Shot 2020-11-24 at 2.14.29 PM

Why is there a difference in my b dataset?

Any suggestion or advice would be appreciated.

Thanks!

Andrew

Hello,

I had a look at your code. The difference between negative_1 and negative_2 is the order of operations. In negative_1 you perform the cumsum and then you divide by the row_number whereas in the negative_2 you do the row_number division first and then you perform the cumsum. This is not the same.

2 Likes

Indeed, you are right.

Thanks mate!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.