Quick selection of many variables in tidyr::nesting()

tidyr
tidyverse

#1

Multiple variables can be used in nesting() by separating them by commas. For situations where there are MANY variables, it might be easier to use the : idiom like what it is used in select() (from dplyr). I am wondering if this is possible?

For these example data (modified from the complete() documentation),

library(dplyr, warn.conflicts = FALSE)
df <- tibble(
  group = c(1:2, 1),
  item_id = c(1:2, 2),
  item_name = c("a", "b", "b"),
  something = c("c", "d", "d"),
  another = c("z", "y", "X"),
  value1 = 1:3,
  value2 = 4:6
)

This works ...

df %>% complete(group, nesting(item_id, item_name, something, another))

But I would like this to work for contiguous variable names (but it does not)

df %>% complete(group, nesting(item_id:another))

Of course, my use case has more than four variables for nesting().

Is there something similar to this that is possible in nesting()?


#2

I don't have an elegant solution to recommend, but hoping others do.

I've certain run into this in other use-cases. You might consider requesting this as a tidyverse feature request. Perhaps as a helper function in the tidyselect package.
Here's a link to submit an issue to tidyverse/tidyselect


#3

You can use select within the nesting, and then use the !!! unquote-splice operator from rlang to provide the columns required.

library(tidyverse)
library(rlang)

df %>% 
  complete(group, nesting(!!!select(., item_id:another)))
> df %>% 
   complete(group, nesting(item_id, item_name, something, another))

# A tibble: 6 x 7
  group item_id item_name something another value1 value2
  <dbl>   <dbl> <chr>     <chr>     <chr>    <int>  <int>
1  1.00    1.00 a         c         z            1      4
2  1.00    2.00 b         d         X            3      6
3  1.00    2.00 b         d         y           NA     NA
4  2.00    1.00 a         c         z           NA     NA
5  2.00    2.00 b         d         X           NA     NA
6  2.00    2.00 b         d         y            2      5

> df %>% 
+   complete(group, nesting(!!!select(., item_id:another)))
# A tibble: 6 x 7
  group item_id item_name something another value1 value2
  <dbl>   <dbl> <chr>     <chr>     <chr>    <int>  <int>
1  1.00    1.00 a         c         z            1      4
2  1.00    2.00 b         d         X            3      6
3  1.00    2.00 b         d         y           NA     NA
4  2.00    1.00 a         c         z           NA     NA
5  2.00    2.00 b         d         X           NA     NA
6  2.00    2.00 b         d         y            2      5

#4

Thank you jakekaupp. This definitely solves the problem, though I was hoping for something that would be easier for me to explain to others. :smile: I will study up on the rlang uses in this solution. Thanks again.


#5
 df %>%  
   complete(group, nesting(!!!select(., item_id:another)))

A more thorough explanation is that the !!! operator is saying "evaluate the following statement and return an unquoted comma spliced set of symbols". For instance, using the following:

!!!select(., item_id:another)

This will select variables from df referenced by the data-dot placeholder from item_id to another. If this gets evaluated by !!! will return item_id, item_name, something, another, which gets fed into nesting and evaluates like your code.


#6

Following @jakekaupp's solution, if you wanted to use the indices instead of the variable names, you could use:

df %>%  
  complete(group, nesting(!!!.[3:5]))

#7

or

df %>%  
  complete(group, nesting(!!!select(., 3:5)))

But it is a bit more typing and select() is not necessary.


#8

Even though rlang does not appear in the list of the core packages loaded by library(tidyverse), it appears that loading it explicitly is not necessary.


#9

No, it seems neither complete() nor nesting() has "select semantics". So, tidyselect is not related here, maybe tidyr's repo is the right place.

IMHO, it's reasonable for complete() (and expand(), which is used inside complete()) to have mutate semantics, but, for nesting(), select semantics seems to fit...


#10

No, it's not, as core functionality is re-exported through dplyr, but it's easier for new users to understand where certain functions come from. When teaching informally about tidy evaluation, I've had a lot of users get confused when you'll need to load rlang for some things (sym and syms) or just stick to the tidyverse.


#11

Makes sense. I guess it is like the regular pipe being part of dplyr, but other pipes requiring magrittr to be loaded.