Why tidy evaluation instead than passing variable names as strings

Ok, I hope I don't get banned for this question :sweat_smile: but suppose I want to write a function which must work with variables whose names are know at runtime. The function would get a data frame as input, the names of whose columns are defined at runtime. The function should use tools from dplyr, tidyr (especially gather) and ggplot.

What's the advantage of using tidyeval and rlang to manage that, instead than passing variable names as string, which is what aes_string() currently offer, for example? Since the tidyverse is going in the direction of deprecating all methods using strings to identify variables names (if I understood correctly), I would like to understand the advantages of the tidyeval-based approach wrt passing variable names as strings.

6 Likes

I strongly think: in R variables should mostly be columns of data frames (scalars being one common exception) and column names should be passed as strings (as carrying environment does not help or improve things in the case of columns).

I have some tooling based on the above principles.

wrapr::let() is a function for performing name-substitutions (that actually pre-dates rlang, and I think a more concise solution), and seplyr is a package showing how a standard (string-value) oriented version of dplyr would work (it uses rlang internally, and I think really shows how you do not have to expose rlang to users to get similar power).

2 Likes

Interesting, I didn't know about wrapr and seplyr. Is seplyr a wrapper around dplyr, so that it uses always the latest dplyr version? It seems pretty useful. Concerning let, I guess the idea is to write parametric functions by wrapping the whole function's body in the let statement, right? Example:

parametrized_plot <- function(xvariable, yvariable, ..){
  wrapr::let(
    list(X = xvariable, Y = yvariable),
    { 
      title = paste(yvariable, "vs", xvariable)
      plot(X, Y, main=title)
    }
  )
  
}


set.seed(1234)
xvar = runif(100) - 0.5
yvar = dnorm(xvar)

xvariable <- "xvar"
yvariable <- "yvar"

parametrized_plot(xvariable, yvariable)

BTW, nice vignettes, I like the graphic style. That's R Markdown html_vignette, right?

1 Like

seplyr is a thin rlang wrapper around dplyr. So it gives you the ability to control variable names via rlang without the user having to get involved with rlang. It uses the version of dplyr you have, so should work the latest version.

Your wrapr::let() example is good. We are now teaching a convention that you have the replacement target and value differ only by capitalization.

parametrized_plot <- function(xvariable, yvariable, ..){
  wrapr::let(
    list(XVARIABLE = xvariable, YVARIABLE = yvariable),
    { 
      title = paste(yvariable, "vs", xvariable)
      plot(XVARIABLE, XVARIABLE, main=title)
    }
  )
  
}


set.seed(1234)
xvar = runif(100) - 0.5
yvar = dnorm(xvar)

xvariable <- "xvar"
yvariable <- "yvar"

parametrized_plot(xvariable, yvariable)

This is very readable and you have both the value carrying version (the lower case) and the replacement target (the upper case) available in your code. So you can use any mix of standard (value consuming) and non-standard (name capturing) interfaces with no trouble.

Our thinking is: if you don't want to get involved with replacement details: use seplyr. If you do want to get involved with replacement details: use wrapr::let().

Thank on the vignettes comment. They are indeed rmarkdown::html_vignette.

2 Likes

Good tip about using seplyr as much as possible, and let only for corner cases or things which are still not covered by seplyr. I just started reading about seplyr docs, it looks quite nice.

1 Like

I have no doubt there are good design decisions for the machinery that powers tidyevaluation but fact is as an end-user who's not a programmer I just dont care to use it.

eg why does one thing need to be evaluated with 3 exclamation marks while others with 2? Why is one thing called quos(...) while another enquo(...)? Why do I have to call rlang to do stuff in a function? I dont see why it needs to be so difficult to pass values into a function.

Again, there are likely very good reasons for this, but seeing that something like Seplyr shows its possible to get Dplyr to behave in an intuitive manner I see little reason to learn tidyeval.

Only issue i have with seplyr is the ":=" operator messes with data.table but thats a niggle more than anything else