Quasiquotation inside a formula

dlm · September 25, 2018, 1:53am

How would one do what is (I hope the obvious intention of that) shown here, to unquote components of a formula?


quasi_lmer <- function(data, response, group) {
  response <- enexpr(response)
  group <- enexpr(group)
  
  data %>% 
    lme4::lmer(!!response ~ (1 | !!group), data = .)
}

tibble(
  batch = rep(1:3, each = 4),
  y = rnorm(12, mean = batch)) %>% 
    quasi_lmer(y, batch)

joels · September 25, 2018, 6:56am

I'm not sure if this is the "right" way to do it (or even a good way), but you can construct a formula using paste and quo_text:

library(tidyverse)
library(rlang)

quasi_lmer <- function(data, response, group) {
  
  response <- enexpr(response)
  group <- enexpr(group)
  
  form = paste(quo_text(response), " ~  (1|", quo_text(group), ")")
  
  # Or this
  #form = as.formula(paste(quo_text(response), " ~  (1|", quo_text(group), ")"))
  
  data %>% lme4::lmer(form, data = .)
}

set.seed(2)
tibble(
  batch = rep(1:3, each = 4),
  y = rnorm(12, mean = batch)) %>% 
  quasi_lmer(y, batch)

Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | batch)
   Data: .
REML criterion at convergence: 36.4223
Random effects:
 Groups   Name        Std.Dev.
 batch    (Intercept) 1.3693  
 Residual             0.9189  
Number of obs: 12, groups:  batch, 3
Fixed Effects:
(Intercept)  
      2.293

Although in my work I've written a fair number of functions that use tidyeval for programming with dplyr, I have to admit, I still don't really get it and I find it painful and confusing except in the simplest use cases. In this case, it seems like it would just be easier to pass strings as arguments and paste together a model formula without bringing non-standard evaluation into the picture. But maybe tidyeval has advantages I'm not aware of. If so, I hope someone will come along and show us how it's done.

dlm · September 25, 2018, 10:33am

Thanks a lot Fren

Tidyeval does seem a little like the dark arts

And thanks for the additional thoughts. Not sure I follow why passing strings might be easier here though (now that you've found an rlang solution). Is it because there are no dplyr functions?

joels · September 25, 2018, 3:59pm

It just seems like unnecessary baggage to bring in tidyeval. For example, you could just do this:

quasi_lmer <- function(data, response, group) {
  
  form = paste(response, " ~  (1|", group, ")")
  data %>% lme4::lmer(form, data = .)
}

tibble(
  batch = rep(1:3, each = 4),
  y = rnorm(12, mean = batch)) %>% 
  quasi_lmer("y", "batch")

Even with an unknown number of fixed and random effects, you can still use strings. For example:

quasi_lmer <- function(data, response, ...) {
  
  ivs = list(...)
  
  form = paste(response, " ~ ", paste(ivs, collapse=" + "))
  
  data %>% lme4::lmer(form, data = .)
}

quasi_lmer(iris, "Sepal.Width", "Sepal.Length", "Petal.Length", "(1|Species)")

As I said earlier, maybe there's some way in which tidyeval increases flexibility or adds other advantages. If there is, I'd be interested in some examples.

If you wanted to use tidyeval, you could do something like the code below, though, once again, I don't know if this is the "right" way to generate model formulas with non-standard evaluation.

quasi_lmer <- function(data, response, ...) {
  
  response = enquo(response)
  ivs = enquos(...)
  
  ivs = paste(map(ivs, quo_text), collapse=" + ")
  
  form = paste(quo_text(response), " ~ ", ivs)
  
  data %>% lme4::lmer(form, data = .)
}

quasi_lmer(iris, Sepal.Width, Sepal.Length, Petal.Length, (1|Species))

jbannon · September 25, 2018, 7:49pm

I hope NSE experts chime in here. I kept trying to do something similar and after multiple attempts defaulted to paste.

edgararuiz · September 25, 2018, 7:59pm

Hi, I propose to take advantage of dplyr pipe commands and address any data selection before piping into your function, that way you only need to pass the outcome field to it.

purrr's map() and reduce() functions are the ones that can be used to avoid walking the formula back and forth from text. You can map() each non-outcome field into their own field (sym()) and then create the stand alone (1 | field ) formula you need. The last step is concatenate all of the formulas into a single one, bringing them together with +, using reduce().

library(rlang, warn.conflicts = FALSE)
library(purrr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(lme4, warn.conflicts = FALSE)
#> Loading required package: Matrix

quasi_lmer <- function(.data, .x){
  out_var <- enquo(.x)
  all_fields <- colnames(.data)
  iout <- all_fields == quo_text(out_var)
  out <- all_fields[iout]
  pred <- all_fields[-iout]
  preds <- map(pred, ~ sym(.x))
  preds <- map(preds, ~ expr((1 | !! .x)))
  preds <- reduce(
    preds, 
    function(x, y) expr(!! x + !! y)
    )
  f <- expr(!! sym(quo_name(out_var)) ~ !!! preds)
  lme4::lmer(f, data = .data)
}

sleepstudy %>%
  quasi_lmer(Reaction)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ (1 | Days) + (1 | Subject)
#>    Data: .data
#> REML criterion at convergence: 1819.738
#> Random effects:
#>  Groups   Name        Std.Dev.
#>  Subject  (Intercept) 37.09   
#>  Days     (Intercept) 31.17   
#>  Residual             31.43   
#> Number of obs: 180, groups:  Subject, 18; Days, 10
#> Fixed Effects:
#> (Intercept)  
#>       298.5

Created on 2018-09-25 by the reprex package (v0.2.0).

dlm · September 27, 2018, 11:47am

Big thanks, both for answering and for expanding on the answer to show a deeper level of tidy evaluation. Lots of useful new nuggets to glean insight from here

Very much appreciated!

May I ask why use the quo functions, rather than the expr variants here?

Your example also works when quo is everywhere replaced with expr, viz

quasi_lmer <- function(.data, .x){
  # out_var <- enquo(.x)
  out_var <- enexpr(.x)
  all_fields <- colnames(.data)
  # iout <- all_fields == quo_text(out_var)
  iout <- all_fields == expr_text(out_var)
  out <- all_fields[iout]
  pred <- all_fields[-iout]
  preds <- map(pred, ~ sym(.x))
  preds <- map(preds, ~ expr((1 | !! .x)))
  preds <- reduce(
    preds, 
    function(x, y) expr(!! x + !! y)
    )
  # f <- expr(!! sym(quo_name(out_var)) ~ !!! preds)
  f <- expr(!! sym(expr_name(out_var)) ~ !!! preds)
  lme4::lmer(f, data = .data)
}

sleepstudy %>%
  quasi_lmer(Reaction)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ (1 | Days) + (1 | Subject)
#>    Data: .data
#> REML criterion at convergence: 1820
#>  Random effects:
#>  Groups   Name        Std.Dev.
#>  Subject  (Intercept) 37.1    
#>  Days     (Intercept) 31.2    
#>  Residual             31.4    
#> Number of obs: 180, groups:  Subject, 18; Days, 10
#> Fixed Effects:
#> (Intercept)  
#>         299

Could you show a situation in which the quo functions are needed when the expr variants don't suffice?

edgararuiz · September 27, 2018, 12:56pm

Hi, I'm glad it helps. And yes, for quoting within the function the expr() command is what's recommended. For unquoting use enquo(). In other words, try do not use enexpr(). That's something I recently learned from @lionel , so maybe he can offer some more background. My understanding is that enquo() will ensure that you'll retain the information of the environment that called your function, so as to prevent calling the wrong .x variable if there are others named the same within your R session. Additionally, since I used enquo() to unquote, then I have to use the related "quo" commands to extract the text or name from the variable.

lionel · September 27, 2018, 6:07pm

That's correct, the general rule is to use enquo() to capture expressions that are not yours and expr() to build your own expressions. We generally don't need to create quosures of our own context because the quosures are created automatically down the line by other quoting functions (if they properly use enquo() or enquos()).

tom_greenwood · May 21, 2019, 7:59am

My understanding was that you can usually get away with enexpr if your function is only going to be called in the global environment, but if another function is going to call your function then you probably need to use enquo, because otherwise R will look for your variables in the wrong place. Is that roughly right? Thanks

lionel · May 21, 2019, 8:14am

The place where your function was created (global environment, package namespace, another function's environment) also matters. Symbols pointing to objects in the global environments might resolve properly but they might also be masked by other objects on the way there. enexpr() should almost never be used.