Clarifying question - When is manual data mask needed in `rlang::eval_tidy`?

dplyr
tidyeval
rlang
datatable

#1

Hi everyone,

I am starting to explore tidy eval. After reading the metaprogramming chapter of advanced R and working through some examples I still lack a deep understanding. I have working code but I feel that if I can understand some examples better than I will have a better understanding of tidy eval in general.

Besides tidyverse I use data.table frequently so I started to experiment. Below is a somewhat artificial example derived from real-life use case. The main question is I believe when and how are data masks created and in which scenarios should I provide the data argument to rlang::eval_tidy by hand?

suppressPackageStartupMessages(library("rlang"))
suppressPackageStartupMessages(library("data.table"))
suppressPackageStartupMessages(library("dplyr"))

dt <- data.table(x = 3)
df <- data.frame(x = 3)

col_expr <- expr(x * 2)
col_quo <- quo(x * 2)

quo(dt[, `:=`("y", !!col_expr)][]) %>% 
    eval_tidy()
#>    x y
#> 1: 3 6

# x is not available in col_quo's environment
quo(dt[, `:=`("y", !!col_quo)][]) %>% 
    eval_tidy()
#> Error in ~x * 2: object 'x' not found

# works as expected
quo(dt[, `:=`("y", !!col_quo)][]) %>% 
    eval_tidy(data = dt)
#>    x y
#> 1: 3 6

# no need for manual data mask with dplyr
quo(mutate(df, y = !!col_quo)) %>% 
    eval_tidy()
#>   x y
#> 1 3 6

devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.0 (2018-04-23)
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       Europe/Budapest             
#>  date     2018-07-18
#> Packages -----------------------------------------------------------------
#>  package    * version    date       source                         
#>  assertthat   0.2.0      2017-04-11 CRAN (R 3.5.0)                 
#>  backports    1.1.2      2017-12-13 CRAN (R 3.5.0)                 
#>  base       * 3.5.0      2018-04-24 local                          
#>  bindr        0.1.1      2018-03-13 CRAN (R 3.5.0)                 
#>  bindrcpp   * 0.2.2      2018-03-29 CRAN (R 3.5.0)                 
#>  compiler     3.5.0      2018-04-24 local                          
#>  data.table * 1.11.4     2018-05-27 CRAN (R 3.5.0)                 
#>  datasets   * 3.5.0      2018-04-24 local                          
#>  devtools     1.13.5     2018-02-18 CRAN (R 3.5.0)                 
#>  digest       0.6.15     2018-01-28 CRAN (R 3.5.0)                 
#>  dplyr      * 0.7.6      2018-06-29 CRAN (R 3.5.1)                 
#>  evaluate     0.10.1     2017-06-24 CRAN (R 3.5.0)                 
#>  glue         1.2.0.9000 2018-05-22 Github (tidyverse/glue@7230ed2)
#>  graphics   * 3.5.0      2018-04-24 local                          
#>  grDevices  * 3.5.0      2018-04-24 local                          
#>  htmltools    0.3.6      2017-04-28 CRAN (R 3.5.0)                 
#>  knitr        1.20       2018-02-20 CRAN (R 3.5.0)                 
#>  magrittr     1.5        2014-11-22 CRAN (R 3.5.0)                 
#>  memoise      1.1.0      2017-04-21 CRAN (R 3.5.0)                 
#>  methods    * 3.5.0      2018-04-24 local                          
#>  pillar       1.2.3      2018-05-25 cran (@1.2.3)                  
#>  pkgconfig    2.0.1      2017-03-21 CRAN (R 3.5.0)                 
#>  purrr        0.2.5      2018-05-29 CRAN (R 3.5.0)                 
#>  R6           2.2.2      2017-06-17 CRAN (R 3.5.0)                 
#>  Rcpp         0.12.17    2018-05-18 CRAN (R 3.5.0)                 
#>  rlang      * 0.2.0.9001 2018-06-16 Github (r-lib/rlang@ba4fb06)   
#>  rmarkdown    1.10       2018-06-11 cran (@1.10)                   
#>  rprojroot    1.3-2      2018-01-03 CRAN (R 3.5.0)                 
#>  stats      * 3.5.0      2018-04-24 local                          
#>  stringi      1.2.2      2018-05-02 CRAN (R 3.5.0)                 
#>  stringr      1.3.1      2018-05-10 CRAN (R 3.5.0)                 
#>  tibble       1.4.2      2018-01-22 CRAN (R 3.5.0)                 
#>  tidyselect   0.2.4      2018-02-26 CRAN (R 3.5.0)                 
#>  tools        3.5.0      2018-04-24 local                          
#>  utils      * 3.5.0      2018-04-24 local                          
#>  withr        2.1.2      2018-03-15 CRAN (R 3.5.0)                 
#>  yaml         2.1.19     2018-05-01 CRAN (R 3.5.0)

Created on 2018-07-18 by the reprex package (v0.2.0).

I suspect one of the differences is that dplyr supports tidy eval natively while data.table not.

I appreciate any explanation on link to a resource.

regards,
Ildi


#2

Since you've already read the material in Advanced R, I'm lifting some relevant sections from the Create a data mask sectiion of the rlang documentation.

Many R functions evaluate quoted expressions in a data mask so these expressions can refer to objects within the user data.

Obviously not a home run answer, since "many R functions" could mean a lot of things, but ¯\(°_o)/¯

Most of the time you can just call eval_tidy() with user data and the data mask will be constructed automatically.

^^ is in the preface describing that, largely, the manual construction of data masks is meant for developers of tidy eval interfaces, as opposed to users.

There are three main use cases for manual creation of data masks:

  1. When eval_tidy() is called with the same data in a tight loop. Tidy eval data masks are a bit expensive to build so it is best to construct it once and reuse it the other times for optimal performance.
  2. When several expressions should be evaluated in the same environment because a quoted expression might create new objects that can be referred in other quoted expressions evaluated at a later time.
  3. When your data mask requires special features. For instance the data frame columns in dplyr data masks are implemented with active bindings.

This thread contains some good info re. providing the .data pronoun manually (referencing Programming with dplyr | Programming recipes) :


#3

Thanks a lot, @mara, the rlang documentation is definitely a goldmine!

One thing I already learned from you: data masks can be lists, not just data frames.

I definitely have a lot to understand though, the below points I do not understand yet.


#4

This answer to another question related to data.table + tidy eval on Stackoverflow by @lionel , developer of rlang was really helpful to me so I link it in case others find it helpful too: limitation of quosures with non-tidy-eval functions