How to make complete(..., nesting()) work with quosures and tidyeval

I'm trying to write a general data summary function that keeps empty groups when summarizing data. I've been using the complete function to do this with hard-coded grouping and nesting variables, but when I try to implement this in a user-defined function, nesting() doesn't seem to work with quosures and tidy evaluation. I'm just curious if there's a way around this or if I need to use a different approach. Here's and example:

library(tidyverse)

d = mtcars %>% 
  mutate_at(vars(carb,am), as.factor)

d %>% 
  group_by(carb, am) %>% 
  tally %>% 
  complete(am, nesting(carb), fill=list(n=0))

The rows with n=0 are the additional rows added by complete to reflect combinations of am and carb that don't exist in the original data.

   am    carb      n
   <fct> <fct> <dbl>
 1 0     1         3
 2 1     1         4
 3 0     2         6
 4 1     2         4
 5 0     3         3
 6 1     3         0
 7 0     4         7
 8 1     4         3
 9 0     6         0
10 1     6         1
11 0     8         0
12 1     8         1

But this approach fails if used in a function with quosures:

fnc = function(group, nest, data) {
  
  group=enquo(group)
  nest=enquo(nest)
  
  data %>% 
    group_by(!!group, !!nest) %>% 
    tally %>% 
    complete(!!group, nesting(!!nest), fill=list(n=0))
}

fnc(am, carb, d)
Error in eval_tidy(xs[[i]], unique_output) : object 'carb' not found 

Is there a way to make this approach work, or do I need to try some other method, like joining the original data with an expanded grid of group combinations?

1 Like

It works using ensym() (n.b. I also reordered group_by(!!nest, !!group) just so it matched the order in your example with group_by(carb, am)). I also named the arguments for the function at the end, since it's atypical to not have data in the first position (though that's obviously just a matter of choice).

library(tidyverse)

d <- mtcars %>%
  mutate_at(vars(carb, am), as.factor)

d %>%
  group_by(carb, am) %>%
  tally() %>%
  complete(am, nesting(carb), fill = list(n = 0))
#> # A tibble: 12 x 3
#> # Groups:   carb [6]
#>    am    carb      n
#>    <fct> <fct> <dbl>
#>  1 0     1         3
#>  2 1     1         4
#>  3 0     2         6
#>  4 1     2         4
#>  5 0     3         3
#>  6 1     3         0
#>  7 0     4         7
#>  8 1     4         3
#>  9 0     6         0
#> 10 1     6         1
#> 11 0     8         0
#> 12 1     8         1

fnc <- function(group, nest, data) {
  group <- enquo(group)
  nest <- ensym(nest)

  data %>%
    group_by(!!nest, !!group) %>%
    tally() %>%
    complete(!!group, nesting(!!nest), fill = list(n = 0))
}

fnc(group = am, nest = carb, data = d)
#> # A tibble: 12 x 3
#> # Groups:   carb [6]
#>    am    carb      n
#>    <fct> <fct> <dbl>
#>  1 0     1         3
#>  2 1     1         4
#>  3 0     2         6
#>  4 1     2         4
#>  5 0     3         3
#>  6 1     3         0
#>  7 0     4         7
#>  8 1     4         3
#>  9 0     6         0
#> 10 1     6         1
#> 11 0     8         0
#> 12 1     8         1

Created on 2018-10-16 by the reprex package (v0.2.1.9000)

It also works using enexpr():

fnc <- function(group, nest, data) {
  group <- enexpr(group)
  nest <- enexpr(nest)

  data %>%
    group_by(!!nest, !!group) %>%
    tally() %>%
    complete(!!group, nesting(!!nest), fill = list(n = 0))
}
2 Likes

Thanks Mara! Can you explain why ensym() (or enexpr()) instead of enquo() (for both group and nest)? How can I determine when to use one or the other?

You don't have to use it for both group and nest. You only get the error on nest().

library(tidyverse)

d <- mtcars %>%
  mutate_at(vars(carb, am), as.factor)

fnc <- function(group, nest, data) {
  group <- enquo(group)
  nest <- ensym(nest)

  data %>%
    group_by(!!nest, !!group) %>%
    tally() %>%
    complete(!!group, nesting(!!nest), fill = list(n = 0))
}

fnc(group = am, nest = carb, data = d)
#> # A tibble: 12 x 3
#> # Groups:   carb [6]
#>    am    carb      n
#>    <fct> <fct> <dbl>
#>  1 0     1         3
#>  2 1     1         4
#>  3 0     2         6
#>  4 1     2         4
#>  5 0     3         3
#>  6 1     3         0
#>  7 0     4         7
#>  8 1     4         3
#>  9 0     6         0
#> 10 1     6         1
#> 11 0     8         0
#> 12 1     8         1

As for why with nest(), I'm gonna pass that one off to Lionel, to avoid the high risk of my borking the explanation.

2 Likes

That's a tricky one. I see why this is happening but I don't know how to explain it concisely and unfortunately I don't see a great way for users to determine when they have to unquote symbols rather than quosured symbols.

I can only tell you that quosures should work almost all the time, and that if your inputs represent data frame columns rather than complex expressions, it's safe to use ensym() and ensyms() instead.

I have opened an issue upstream so we can try and find a solution to this problem. https://github.com/tidyverse/tidyr/issues/506

4 Likes