Quasiquotation inside a formula

How would one do what is (I hope the obvious intention of that) shown here, to unquote components of a formula?


quasi_lmer <- function(data, response, group) {
  response <- enexpr(response)
  group <- enexpr(group)
  
  data %>% 
    lme4::lmer(!!response ~ (1 | !!group), data = .)
}

tibble(
  batch = rep(1:3, each = 4),
  y = rnorm(12, mean = batch)) %>% 
    quasi_lmer(y, batch)

1 Like

I'm not sure if this is the "right" way to do it (or even a good way), but you can construct a formula using paste and quo_text:

library(tidyverse)
library(rlang)

quasi_lmer <- function(data, response, group) {
  
  response <- enexpr(response)
  group <- enexpr(group)
  
  form = paste(quo_text(response), " ~  (1|", quo_text(group), ")")
  
  # Or this
  #form = as.formula(paste(quo_text(response), " ~  (1|", quo_text(group), ")"))
  
  data %>% lme4::lmer(form, data = .)
}
set.seed(2)
tibble(
  batch = rep(1:3, each = 4),
  y = rnorm(12, mean = batch)) %>% 
  quasi_lmer(y, batch)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ (1 | batch)
   Data: .
REML criterion at convergence: 36.4223
Random effects:
 Groups   Name        Std.Dev.
 batch    (Intercept) 1.3693  
 Residual             0.9189  
Number of obs: 12, groups:  batch, 3
Fixed Effects:
(Intercept)  
      2.293  

Although in my work I've written a fair number of functions that use tidyeval for programming with dplyr, I have to admit, I still don't really get it and I find it painful and confusing except in the simplest use cases. In this case, it seems like it would just be easier to pass strings as arguments and paste together a model formula without bringing non-standard evaluation into the picture. But maybe tidyeval has advantages I'm not aware of. If so, I hope someone will come along and show us how it's done.

Thanks a lot Fren

Tidyeval does seem a little like the dark arts

And thanks for the additional thoughts. Not sure I follow why passing strings might be easier here though (now that you've found an rlang solution). Is it because there are no dplyr functions?

It just seems like unnecessary baggage to bring in tidyeval. For example, you could just do this:

quasi_lmer <- function(data, response, group) {
  
  form = paste(response, " ~  (1|", group, ")")
  data %>% lme4::lmer(form, data = .)
}

tibble(
  batch = rep(1:3, each = 4),
  y = rnorm(12, mean = batch)) %>% 
  quasi_lmer("y", "batch")

Even with an unknown number of fixed and random effects, you can still use strings. For example:

quasi_lmer <- function(data, response, ...) {
  
  ivs = list(...)
  
  form = paste(response, " ~ ", paste(ivs, collapse=" + "))
  
  data %>% lme4::lmer(form, data = .)
}

quasi_lmer(iris, "Sepal.Width", "Sepal.Length", "Petal.Length", "(1|Species)")

As I said earlier, maybe there's some way in which tidyeval increases flexibility or adds other advantages. If there is, I'd be interested in some examples.

If you wanted to use tidyeval, you could do something like the code below, though, once again, I don't know if this is the "right" way to generate model formulas with non-standard evaluation.

quasi_lmer <- function(data, response, ...) {
  
  response = enquo(response)
  ivs = enquos(...)
  
  ivs = paste(map(ivs, quo_text), collapse=" + ")
  
  form = paste(quo_text(response), " ~ ", ivs)
  
  data %>% lme4::lmer(form, data = .)
}

quasi_lmer(iris, Sepal.Width, Sepal.Length, Petal.Length, (1|Species))
1 Like

I hope NSE experts chime in here. I kept trying to do something similar and after multiple attempts defaulted to paste.

Hi, I propose to take advantage of dplyr pipe commands and address any data selection before piping into your function, that way you only need to pass the outcome field to it.

purrr's map() and reduce() functions are the ones that can be used to avoid walking the formula back and forth from text. You can map() each non-outcome field into their own field (sym()) and then create the stand alone (1 | field ) formula you need. The last step is concatenate all of the formulas into a single one, bringing them together with +, using reduce().

library(rlang, warn.conflicts = FALSE)
library(purrr, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
library(lme4, warn.conflicts = FALSE)
#> Loading required package: Matrix

quasi_lmer <- function(.data, .x){
  out_var <- enquo(.x)
  all_fields <- colnames(.data)
  iout <- all_fields == quo_text(out_var)
  out <- all_fields[iout]
  pred <- all_fields[-iout]
  preds <- map(pred, ~ sym(.x))
  preds <- map(preds, ~ expr((1 | !! .x)))
  preds <- reduce(
    preds, 
    function(x, y) expr(!! x + !! y)
    )
  f <- expr(!! sym(quo_name(out_var)) ~ !!! preds)
  lme4::lmer(f, data = .data)
}

sleepstudy %>%
  quasi_lmer(Reaction)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ (1 | Days) + (1 | Subject)
#>    Data: .data
#> REML criterion at convergence: 1819.738
#> Random effects:
#>  Groups   Name        Std.Dev.
#>  Subject  (Intercept) 37.09   
#>  Days     (Intercept) 31.17   
#>  Residual             31.43   
#> Number of obs: 180, groups:  Subject, 18; Days, 10
#> Fixed Effects:
#> (Intercept)  
#>       298.5

Created on 2018-09-25 by the reprex package (v0.2.0).

4 Likes

Big thanks, both for answering and for expanding on the answer to show a deeper level of tidy evaluation. Lots of useful new nuggets to glean insight from here

Very much appreciated!

May I ask why use the quo functions, rather than the expr variants here?

Your example also works when quo is everywhere replaced with expr, viz

quasi_lmer <- function(.data, .x){
  # out_var <- enquo(.x)
  out_var <- enexpr(.x)
  all_fields <- colnames(.data)
  # iout <- all_fields == quo_text(out_var)
  iout <- all_fields == expr_text(out_var)
  out <- all_fields[iout]
  pred <- all_fields[-iout]
  preds <- map(pred, ~ sym(.x))
  preds <- map(preds, ~ expr((1 | !! .x)))
  preds <- reduce(
    preds, 
    function(x, y) expr(!! x + !! y)
    )
  # f <- expr(!! sym(quo_name(out_var)) ~ !!! preds)
  f <- expr(!! sym(expr_name(out_var)) ~ !!! preds)
  lme4::lmer(f, data = .data)
}

sleepstudy %>%
  quasi_lmer(Reaction)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: Reaction ~ (1 | Days) + (1 | Subject)
#>    Data: .data
#> REML criterion at convergence: 1820
#>  Random effects:
#>  Groups   Name        Std.Dev.
#>  Subject  (Intercept) 37.1    
#>  Days     (Intercept) 31.2    
#>  Residual             31.4    
#> Number of obs: 180, groups:  Subject, 18; Days, 10
#> Fixed Effects:
#> (Intercept)  
#>         299  

Could you show a situation in which the quo functions are needed when the expr variants don't suffice?

Hi, I'm glad it helps. And yes, for quoting within the function the expr() command is what's recommended. For unquoting use enquo(). In other words, try do not use enexpr(). That's something I recently learned from @lionel , so maybe he can offer some more background. My understanding is that enquo() will ensure that you'll retain the information of the environment that called your function, so as to prevent calling the wrong .x variable if there are others named the same within your R session. Additionally, since I used enquo() to unquote, then I have to use the related "quo" commands to extract the text or name from the variable.

2 Likes

That's correct, the general rule is to use enquo() to capture expressions that are not yours and expr() to build your own expressions. We generally don't need to create quosures of our own context because the quosures are created automatically down the line by other quoting functions (if they properly use enquo() or enquos()).

4 Likes

My understanding was that you can usually get away with enexpr if your function is only going to be called in the global environment, but if another function is going to call your function then you probably need to use enquo, because otherwise R will look for your variables in the wrong place. Is that roughly right? Thanks

The place where your function was created (global environment, package namespace, another function's environment) also matters. Symbols pointing to objects in the global environments might resolve properly but they might also be masked by other objects on the way there. enexpr() should almost never be used.