Managing lists of symbols with exprs()

While reading this Introduction to Tidyeval by @lionel I was struck by this statement:

Indirect references in quoting functions are rarely useful in scripts but they are invaluable for writing functions.

In my R scripts I find that I use indirect references in quoting functions all the time. I wonder if this is good practice?

Here's an example select and join using explicit variable names (adapted from r4ds):

library(tidyverse)
library(nycflights13)

flights2 <- flights %>% 
  select(year:day, hour, origin, dest, tailnum, carrier)

flights2 %>% 
  left_join(weather, by = c("year", "month", "day", "hour"))

When I write scripts like this I tend to abstract out the column names as a list of symbols. Typically my script has multiple data wrangling operations with an interdependence on certain lists of variables (particularly for grouping and merging) so putting them in a list keeps out bugs and focuses the text of the code on the internal logic of the operations.

timedims  <- exprs(year, month, day, hour)
geodims   <- exprs(origin, dest)
planedims <- exprs(tailnum, carrier)

flights2 <- flights %>% 
  select(!!!timedims, !!!geodims, !!!planedims)

flights2 %>% 
  left_join(weather, by = as.character(timedims))

I started out using quos() but have gravitated towards exprs() because it is easier to extract the symbols as a string. The most common reason I have to do this is the by list in *_join(). The alternative with quosures gets complicated. Is there a quos_name() function in the works?:

timedimq  <- quos(year, month, day, hour)

flights2 %>% 
  left_join(weather, by = map_chr(timedimq, quo_name))

Note that select() supports sequences but *_join() throws an error. I don't see a good way around this.

timedims2  <- exprs(year:day, hour) # this is more compact
flights2 <- flights %>% 
  select(!!!timedims, !!!geodims, !!!planedims) # works here
flights2 %>% 
  left_join(weather, by = as.character(timedims)) # doesn't work here

What do you use to manage lists of symbols in your scripts?

Are there any plans for the *_join() functions to pick up support for lists of symbols in the by parameter?

2 Likes

You bring up good points. Your code is already abstracted and so is easy to change, and easy to refactor as functions.

quo_name() is not meant for transforming quosured symbols to strings. It's a general purpose deparser that should only be used for creating default names. With rlang 0.3.0, the help page of ?quo_name makes this point clearer.

If you're manipulating column names, you don't need any quosure at all actually. So your workflow of using lists of symbols and using as.character() with functions taking strings is good.

Are there any plans for the *_join() functions to pick up support for lists of symbols in the by parameter?

I was thinking about that recently. I think the by parameter could be treated like the .vars parameter of mutate_at(), summarise_at() etc. Then you could either pass a character vector or a vars() specification:

df %>% left_join(df2, by = vars(starts_with("s")))

I have posted an issue: Should `by` parameters of join functions support vars() specifications? · Issue #3965 · tidyverse/dplyr · GitHub

3 Likes

Thanks @lionel!

Anywhere in tidyverse that you can replace passing strings with passing symbols would, I think, be helpful and more consistent. I see that vars() is a wrapper for quos() and yet I'm often using exprs() instead. Some explanation of where quosures are required would be helpful.

About quo_name() and alternatives. I played around with as_string() but didn't manage to get it working. Perhaps you can throw an example into that man page?

This would be as_string(quo_get_expr(quo)). But that seems like a strange thing to do, so I'm not sure it should be in the doc.

I see the quo_* syntax can get ugly quickly. I'll stick to exprs() for this application. Thanks again!

> timedims  <- exprs(year, month, day, hour)
> str(as.character(timedims))
 chr [1:4] "year" "month" "day" "hour"
> 
> timedimq  <- quos(year, month, day, hour)
> str(map_chr(timedimq, quo_name))
 Named chr [1:4] "year" "month" "day" "hour"
 - attr(*, "names")= chr [1:4] "" "" "" ""
> str(as.character(map(timedimq, quo_get_expr)))
 chr [1:4] "year" "month" "day" "hour"
> str(map_chr(map(timedimq, quo_get_expr), as_string))
 Named chr [1:4] "year" "month" "day" "hour"
 - attr(*, "names")= chr [1:4] "" "" "" ""

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.