Problems with calculating means with group_by and mutate

Greetings fellow R users,

I started using R only a week ago and got very much stuck on a problem I have hammered my head against for days now (unsucessfully of course).

My data includes records about the duration of activities over several years with month and year as seperate variables. I now wish to create a new variable which depicts the mean durations for the respective month besides each observation in a seperate column in the same dataframe, so I can generate a nice plot later on, showing said monthly average as a reference.
Here is a data example, similar yet much smaller than my original dataset.


year <- c(2017, 2017, 2017, 2017, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019)

month <- c(6, 7, 9, 2, 2, 6, 6, 9, 7, 2, 2, 2)

duration <- c(13, 8, 15, 7, 18, NA, 13, 19, 4, 9, 11, 11)

activity <- data.frame (year, month, duration)

Created on 2020-07-21 by the reprex package (v0.3.0)

My approach to reach my goal was using the group_by() and the mutate() commands as follows. I want to keep everything inside that one dataframe because I have multiple dataframes in my project.

activity$duration_mean_month <- activity %>%

group_by(activity$year, activity$month) %>%

mutate(activity$duration_mean_month = mean(activity$duration, na.rm = TRUE))

But I always get the error message:

Error: unexpected '=' in:

" group_by(activity$year, activity$month) %>%

mutate(activity$duration_mean_month ="

For several days now I tried to find a solution for this problem, even desperately going as far as trying to write a loop (which, with my very limited programming experience went poorly) to solve this problem differently. I hope some wise soul out there will find a way to fix my code and thus expand my horizon.

you are improperly mixing base R with tidyverse syntax. in base R you work on vectors, and you pluck them out of dataframes or refer/to assign them into dataframes with $ syntax.
In tidyverse you do not. mutate knows that a variable is in the frame that you passed to it, so doesnt need that info duplicated.

activity <- activity %>%
  group_by(year, month) %>%
  mutate(duration_mean_month = mean(duration, na.rm = TRUE))
1 Like

Thank you very much, this works!
So easy and yet I didn't get it on my own :thinking:

What material are you using to study from ?
A great (free) resource is at https://r4ds.had.co.nz/

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.