Tidyeval solution for "negative" group_by

timpe · February 14, 2018, 4:05pm

Is there a tidyeval way to group_by() all columns in a data frame EXCEPT for the ones you don't want grouped? I often need to come up with shares over 1 dimension, so I use group_by_() to select all names except the ones I want.

This is useful for me for programming as I don't always know the names of the other columns in the data frame. My current method seems inefficient, though.

In the example below, I group by all columns except Sex and Freq (the data). This returns the shares by sex, grouped by Hair and Eye.

as.data.frame(HairEyeColor) %>% 
  group_by_(.dots = as.list(names(.)[!(names(.) %in% c("Sex", "Freq"))])) %>% 
  mutate(share = Freq / sum(Freq)) %>% 
  ungroup()

danr · February 14, 2018, 4:18pm

You should make reproducible example of your code to make it easier for us to answer your questions. See

https://www.jessemaegan.com/post/so-you-ve-been-asked-to-make-a-reprex

I think this does what are you looking for:


suppressPackageStartupMessages(library(tidyverse))
t1 <- as.data.frame(HairEyeColor) %>% 
   group_by_(.dots = as.list(names(.)[!(names(.) %in% c("Sex", "Freq"))])) %>% 
   mutate(share = Freq / sum(Freq)) %>% 
   ungroup()


t2 <- as.data.frame(HairEyeColor) %>% 
   group_by_at(vars(-Sex, -Freq)) %>% 
   mutate(share = Freq / sum(Freq)) %>% 
   ungroup()


identical(t1, t2)
#> [1] TRUE

Created on 2018-02-14 by the reprex package (v0.2.0).

timpe · February 14, 2018, 4:43pm

Thanks. This is a helpful start. I'm still trying to wrap my head about tidyeval, but is there a solution that doesn't require me to explicitly know that Sex and Freq are the names of the excluded variables, but rather, I can pass the string c("Sex", "Freq") into the vars() function?

mishabalyasin · February 14, 2018, 4:55pm

I can't recommend the webinar by Lionel Henry enough for wrapping your head around tidyeval: link
In this specific case you can do something like this:

library(tidyverse)
vars <- c("Sex", "Freq")

dots <- purrr::map(vars, function(variable){
  dot <- rlang::sym(variable)
  rlang::quo(-!!dot)
})

as.data.frame(HairEyeColor) %>% 
  group_by_at(vars(!!!dots)) %>% 
  mutate(share = Freq / sum(Freq)) %>% 
  ungroup()

# A tibble: 32 x 5
   Hair  Eye   Sex    Freq share
   <fct> <fct> <fct> <dbl> <dbl>
 1 Black Brown Male  32.0  0.471
 2 Brown Brown Male  53.0  0.445
 3 Red   Brown Male  10.0  0.385
 4 Blond Brown Male   3.00 0.429
 5 Black Blue  Male  11.0  0.550
 6 Brown Blue  Male  50.0  0.595
 7 Red   Blue  Male  10.0  0.588
 8 Blond Blue  Male  30.0  0.319
 9 Black Hazel Male  10.0  0.667
10 Brown Hazel Male  25.0  0.463
# ... with 22 more rows

timpe · February 14, 2018, 6:19pm

Thanks so much! Will definitely watch so I fully understand what's happening, but this does the trick!