Managing lists of symbols with exprs()

dplyr
rlang

#1

While reading this Introduction to Tidyeval by @lionel I was struck by this statement:

Indirect references in quoting functions are rarely useful in scripts but they are invaluable for writing functions.

In my R scripts I find that I use indirect references in quoting functions all the time. I wonder if this is good practice?

Here's an example select and join using explicit variable names (adapted from r4ds):

library(tidyverse)
library(nycflights13)

flights2 <- flights %>% 
  select(year:day, hour, origin, dest, tailnum, carrier)

flights2 %>% 
  left_join(weather, by = c("year", "month", "day", "hour"))

When I write scripts like this I tend to abstract out the column names as a list of symbols. Typically my script has multiple data wrangling operations with an interdependence on certain lists of variables (particularly for grouping and merging) so putting them in a list keeps out bugs and focuses the text of the code on the internal logic of the operations.

timedims  <- exprs(year, month, day, hour)
geodims   <- exprs(origin, dest)
planedims <- exprs(tailnum, carrier)

flights2 <- flights %>% 
  select(!!!timedims, !!!geodims, !!!planedims)

flights2 %>% 
  left_join(weather, by = as.character(timedims))

I started out using quos() but have gravitated towards exprs() because it is easier to extract the symbols as a string. The most common reason I have to do this is the by list in *_join(). The alternative with quosures gets complicated. Is there a quos_name() function in the works?:

timedimq  <- quos(year, month, day, hour)

flights2 %>% 
  left_join(weather, by = map_chr(timedimq, quo_name))

Note that select() supports sequences but *_join() throws an error. I don't see a good way around this.

timedims2  <- exprs(year:day, hour) # this is more compact
flights2 <- flights %>% 
  select(!!!timedims, !!!geodims, !!!planedims) # works here
flights2 %>% 
  left_join(weather, by = as.character(timedims)) # doesn't work here

What do you use to manage lists of symbols in your scripts?

Are there any plans for the *_join() functions to pick up support for lists of symbols in the by parameter?


#2

You bring up good points. Your code is already abstracted and so is easy to change, and easy to refactor as functions.

quo_name() is not meant for transforming quosured symbols to strings. It's a general purpose deparser that should only be used for creating default names. With rlang 0.3.0, the help page of ?quo_name makes this point clearer.

If you're manipulating column names, you don't need any quosure at all actually. So your workflow of using lists of symbols and using as.character() with functions taking strings is good.

Are there any plans for the *_join() functions to pick up support for lists of symbols in the by parameter?

I was thinking about that recently. I think the by parameter could be treated like the .vars parameter of mutate_at(), summarise_at() etc. Then you could either pass a character vector or a vars() specification:

df %>% left_join(df2, by = vars(starts_with("s")))

I have posted an issue: https://github.com/tidyverse/dplyr/issues/3965