Passing named list to mutate (and probably other dplyr verbs)

Clearly we are all still learning here. I definitely retract my earlier statement where I said "we don't need the environment the argument list was created in." We do!

To expand upon Hadley's point, let's shoot ourself in the foot with my (not so good) approach.

library(dplyr)
library(rlang)

mtcars_tbl <- as_tibble(mtcars)

# Say we want to use this variable in the mutate call.
important_var <- 4

foo_bad <- function(x, args) {
  # But it also happens to be defined here 
  # because the function designer felt like using it
  important_var <- 5
  args_call    <- rlang::enexpr(args)
  list_of_args <- rlang::lang_args(args_call)
  mutate(x, !!! list_of_args)
}

# No environment has been captured with list(), so we don't know which
# important_var to use. I think by default mutate() then finds 
# the first one it sees while working it's way back up the function calls?
# That would be `important_var <- 5`, which is not what the user wants!

# Look how cyl2 = cyl * 5   (not cyl * 4 like we wanted)
foo_bad(mtcars_tbl, list(cyl2 = cyl * important_var))
#> # A tibble: 32 x 12
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb  cyl2
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.0  6.00   160 110    3.90  2.62  16.5  0     1.00  4.00  4.00  30.0
#>  2  21.0  6.00   160 110    3.90  2.88  17.0  0     1.00  4.00  4.00  30.0
#>  3  22.8  4.00   108  93.0  3.85  2.32  18.6  1.00  1.00  4.00  1.00  20.0
#>  4  21.4  6.00   258 110    3.08  3.22  19.4  1.00  0     3.00  1.00  30.0
#>  5  18.7  8.00   360 175    3.15  3.44  17.0  0     0     3.00  2.00  40.0
#>  6  18.1  6.00   225 105    2.76  3.46  20.2  1.00  0     3.00  1.00  30.0
#>  7  14.3  8.00   360 245    3.21  3.57  15.8  0     0     3.00  4.00  40.0
#>  8  24.4  4.00   147  62.0  3.69  3.19  20.0  1.00  0     4.00  2.00  20.0
#>  9  22.8  4.00   141  95.0  3.92  3.15  22.9  1.00  0     4.00  2.00  20.0
#> 10  19.2  6.00   168 123    3.92  3.44  18.3  1.00  0     4.00  4.00  30.0
#> # ... with 22 more rows


# Here we are going to use quos instead of list, like Hadley advises
foo_good <- function(x, args) {
  mutate(x, !!! args)
}

# Importantly, we capture the environment where important_var is defined using
# quos(). The call to mutate() now KNOWS that it should be 4, not 5
# because the environment has been dragged along in the quosure
foo_good(mtcars_tbl, quos(cyl2 = cyl * important_var))
#> # A tibble: 32 x 12
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb  cyl2
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.0  6.00   160 110    3.90  2.62  16.5  0     1.00  4.00  4.00  24.0
#>  2  21.0  6.00   160 110    3.90  2.88  17.0  0     1.00  4.00  4.00  24.0
#>  3  22.8  4.00   108  93.0  3.85  2.32  18.6  1.00  1.00  4.00  1.00  16.0
#>  4  21.4  6.00   258 110    3.08  3.22  19.4  1.00  0     3.00  1.00  24.0
#>  5  18.7  8.00   360 175    3.15  3.44  17.0  0     0     3.00  2.00  32.0
#>  6  18.1  6.00   225 105    2.76  3.46  20.2  1.00  0     3.00  1.00  24.0
#>  7  14.3  8.00   360 245    3.21  3.57  15.8  0     0     3.00  4.00  32.0
#>  8  24.4  4.00   147  62.0  3.69  3.19  20.0  1.00  0     4.00  2.00  16.0
#>  9  22.8  4.00   141  95.0  3.92  3.15  22.9  1.00  0     4.00  2.00  16.0
#> 10  19.2  6.00   168 123    3.92  3.44  18.3  1.00  0     4.00  4.00  24.0
#> # ... with 22 more rows

I think this is actually really important to understand in depth, so I'm thankful for this thread!

5 Likes