mean of a variable depending on other columns values

Hi there! I would like to know how can I transform a dataframe to have the average of some variables depending on groups determined by other variables. In this case there are 4 groups, one per every state and for each year. I have this data:

state<- c( Alabama, Alabama, Alabama, Alabama, Arkansas, Arkansas, Arkansas, Arkansas)
year<- c(2000, 2000, 2001, 2001, 2000, 2000, 2001, 2001)
gender <- c(1, 0, 1, 0, 1, 0, 1, 0)
age65 <- c(0, 1, 0, 1, 0, 1, 0, 1)
df1 <- data.frame(state, year, gender, age65)

I would like to obtain this data, getting the average of the values of gender and age65

state<- c( Alabama, Alabama, Arkansas, Arkansas)
year<- c(2000, 2001, 2000, 2001)
gender <- c(0.5, 0.5, 0.5, 0.5)
age65 <- c(0.5, 0.5, 0.5, 0.5)
df2 <- data.frame(state, year, gender, age65)

Please note that in my original dataset i have multiple states and years, so I would like to obtain a code that does not contain specific observation names but variable names.

Thanks!

Hi. Like this?

library(tidyverse)

state <- c(rep('Alabama', 4), rep('Arkansas', 4)) # these need to be strings
year <- c(2000, 2000, 2001, 2001, 2000, 2000, 2001, 2001)
gender <- c(1, 0, 1, 0, 1, 0, 1, 0)
age65 <- c(0, 1, 0, 1, 0, 1, 0, 1)
df1 <- data.frame(state, year, gender, age65)

df1 %>% 
  group_by(state, year) %>% 
  summarise(across(gender:age65, mean)) %>% 
  ungroup()

# A tibble: 4 x 4
  state     year gender age65
  <chr>    <dbl>  <dbl> <dbl>
1 Alabama   2000    0.5   0.5
2 Alabama   2001    0.5   0.5
3 Arkansas  2000    0.5   0.5
4 Arkansas  2001    0.5   0.5

By running this I obtain the following error:

Adding missing grouping variables: state, year
Error in context_peek():
! across() must only be used inside dplyr verbs.
Backtrace:

  1. ... %>% ungroup()
  2. plyr::summarise(., across(gender:age65, mean))
  3. [ base::eval(...) ] with 1 more call
  4. dplyr::across(gender:age65, mean)
  5. dplyr:::across_setup(...)
  6. dplyr:::peek_mask("across()")
  7. dplyr:::context_peek("mask", fun)

(I have already loaded the dplyr and plyr packages)

How could I solve it?

Do not load plyr, you are getting a name clash with dplyr::summarise()

I need the plyr package for other comads in my script, is it possible to get a code that solves the problem as @williaml did but without needing to unload the plyr package?

Thanks!

plyr is an old superseded package so you might want to consider updating your script but anyways, you can specify from wich package comes the function you want to use so the code would be:

df1 %>% 
    dplyr::group_by(state, year) %>% 
    dplyr::summarise(dplyr::across(gender:age65, mean)) %>% 
    dplyr::ungroup()
1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.