A confusing result from `tidyeval` (Can you believe it!?)


#1

A friend in our local R-stats meetup posted this question today that I found intriguing and wondered if anyone could shed light upon:

suppressMessages(library(dplyr))
wtf <- function(df = mtcars, test = "stupid"){
  
  df %>%
    mutate(!! test := mpg) %>%
    mutate(!! test := (!! test) + 1)
}

wtf_2 <- function(df = mtcars, test = "stupid"){
  
  df %>%
    mutate(!! test := mpg) %>%
    mutate(!! test := (!! as.name(test)) + 1)
}

wtf()
#> Error in mutate_impl(.data, dots): Evaluation error: non-numeric argument to binary operator.

head(wtf_2())
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb stupid
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4   22.0
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4   22.0
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   23.8
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1   22.4
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2   19.7
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1   19.1

Created on 2018-08-22 by the reprex package (v0.2.0).

Obviously, we got it to work by adding in the as.name() function around test, but that doesn't feel very...tidy.

We also managed to identify the source of the issue:

> quo(mutate(df, !! test := (!! test) + 1))
<quosure>
  expr: ^mutate(df, "stupid" := "stupid" + 1)
  env:  global
> quo(mutate(df, !! test := (!! as.name(test)) + 1))
<quosure>
  expr: ^mutate(df, "stupid" := stupid + 1)
  env:  global

But, nobody could come up with the best function to get this to actually work using the tidyeval format. Do any of you have insight?


#2
suppressMessages(library(dplyr))
wtf <- function(df = mtcars, test = quo(stupid)){
  
  df %>%
    mutate(!! test := mpg) %>%
    mutate(!! test := (!! test) + 1)
}
head(wtf())
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb stupid
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4   22.0
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4   22.0
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   23.8
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1   22.4
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2   19.7
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1   19.1

Created on 2018-08-22 by the reprex package (v0.2.0.9000).


#3

And, using enxpr() should let you use an unquoted argument

suppressMessages(library(dplyr))
wtf <- function(df = mtcars, test = stupid){
  test <- enexpr(test)
  df %>%
    mutate(!! test := mpg) %>%
    mutate(!! test := !! test + 1)
}
head(wtf())
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb stupid
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4   22.0
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4   22.0
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   23.8
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1   22.4
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2   19.7
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1   19.1

Created on 2018-08-22 by the reprex package (v0.2.0).


#4

Well, shoot, I get

Error: The LHS of := must be a string or a symbol

with your quo(stupid) code. I'm working on CRAN versions of rlang/dplyr only, though, maybe this a newer change?.


#5

Either way, Edgar's version with enexpr() makes more sense, I think! :slight_smile:


#6

While enexpr is cleaner, you can also do it with enquo and quo_name:

suppressMessages(library(dplyr))
wtf <- function(df = mtcars, test = stupid){
  test <- enexpr(test)
  df %>%
    mutate(!! test := mpg) %>%
    mutate(!! test := !! test + 1)
}

wtf_2 <- function(df = mtcars, test = stupid){
  test <- enquo(test)
  test_name <- quo_name(test)
  df %>%
    mutate(!! test_name := mpg) %>%
    mutate(!! test_name := !! test + 1)
}

identical(wtf(), wtf_2())
#> [1] TRUE

Created on 2018-08-22 by the reprex package (v0.2.0).

Clearly, in this case, enexpr seems more appropriate since it takes care of both enquo and quo_name steps in one, but this way may be handy in other cases, so I thought I would post.


#7

You have an error because as you saw in your "identifying the issue" code, you don't want a string as RHS of := but a symbol (or a name). You manage to do that with as.name.
Per the quotation rlang doc

Symbols represent the name that is given to an object in a particular context
this is what you want.

The equivalent function in rlang is sym. So you could manage to make it work just changing that, and still providing test = "stupid" as a string

suppressMessages(library(dplyr))
#> Warning: le package 'dplyr' a été compilé avec la version R 3.4.4
wtf <- function(df = mtcars, test = "stupid"){
    df %>%
        mutate(!! test := mpg) %>%
        mutate(!! test := !! sym(test) + 1)
}
head(wtf())
#> Warning: le package 'bindrcpp' a été compilé avec la version R 3.4.4
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb stupid
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4   22.0
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4   22.0
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   23.8
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1   22.4
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2   19.7
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1   19.1

Created on 2018-08-22 by the reprex package (v0.1.1.9000).

All the other option mentioned in this thread are also correct if you want to full tidyeval and do not provide test as a string but as an expression directly


#8

Perfect! Thanks for all the help!


#9

A more generalized example:

suppressMessages(library(dplyr))

wtf <- function(df = mtcars, var = mpg, test = stupid){
  test <- enexpr(test)
  var <- enexpr(var)
  df %>%
    mutate(!! test :=  !!var)
}

mtcars %>%
  wtf(mpg + 1) %>%
  head()
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb stupid
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4   22.0
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4   22.0
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   23.8
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1   22.4
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2   19.7
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1   19.1

iris %>%
  wtf(Sepal.Length + 1) %>%
  head()
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species stupid
#> 1          5.1         3.5          1.4         0.2  setosa    6.1
#> 2          4.9         3.0          1.4         0.2  setosa    5.9
#> 3          4.7         3.2          1.3         0.2  setosa    5.7
#> 4          4.6         3.1          1.5         0.2  setosa    5.6
#> 5          5.0         3.6          1.4         0.2  setosa    6.0
#> 6          5.4         3.9          1.7         0.4  setosa    6.4

Created on 2018-08-22 by the reprex package (v0.2.0).


#10

Here is another attempt to explain the distinctions. Please consider the following 3 examples.

library("dplyr")


mtcars %>%
  mutate(newcol := mpg) %>%
  mutate(newcol := newcol + 1) %>%
  head() # works


SYMBOL <- rlang::sym("newcol")
mtcars %>%
  mutate(!!SYMBOL := mpg) %>%
  mutate(!!SYMBOL := !!SYMBOL + 1) %>%
  head() # works


SYMSTR <- "newcol"
mtcars %>%
  mutate(!!SYMSTR := mpg) %>%
  mutate(!!SYMSTR := !!SYMSTR + 1) %>%
  head() # errors-out

The third one (SYMSTR) errors out. This means that strings are not shorthands for symbols (fair enough). The issue is: the mental model that "!!" substitutes in newcol is wrong, it in fact substitutes in "newcol" (notice the quotes).

What saves a lot of examples is dplyr::mutate() is willing to accept quoted strings on the left-hand-sides of assignments (it does not insist on un-quoted symbols).

mtcars %>%
  mutate("newcol" := mpg) %>%
  mutate("newcol" := newcol + 1) %>%
  head() # works

mtcars %>%
  mutate("newcol" := mpg) %>%
  mutate("newcol" := "newcol" + 1) %>%
  head() # errors-out

Additional complexity is from the fact "!!" is willing to substitute both names (column names, names of variables and so on) and also substitute values (strings, possibly numbers). This is obviously confusing some users (as some expect only name substitutions).

Our function wrapr::let() tries to stay closer to a names-only substitution model.

library("wrapr")
library("dplyr")
let(
  c(NEWCOL = "newcol"),
  mtcars %>%
    mutate(NEWCOL = mpg) %>%
    mutate(NEWCOL = NEWCOL + 1) %>%
    head() 
)

We have a formal write up on wrapr::let() here.


#11

You might like to try friendlyeval to help resolve the correct rlang function in cases where it's unlcear.

There's a string you want to use as a column name. There's a function for that:

library(friendlyeval)
wtf_3 <- function(df = mtcars, test = "stupid"){
  
  df %>%
    mutate(!! friendlyeval::treat_string_as_col(test) := mpg) %>%
    mutate(!! friendlyeval::treat_string_as_col(test) := !! friendlyeval::treat_string_as_col(test) + 1)
}

Looks ugly no? No worries, transpile it away with the RStudio addin to:

wtf3 <- function(df = mtcars, test = "stupid"){
  
  df %>%
    mutate(!! rlang::sym(test) := mpg) %>%
    mutate(!! rlang::sym(test) := !! rlang::sym(test) + 1)
}

> identical (wtf_2(), wtf_3())
# [1] TRUE

Turns out @cderv was on the money! :grin:


#12

This is a property of R generally, not of dplyr specifically:

"x" <- 1:10
"mean"(x)
#> [1] 5.5
mean("x" = x)
#> [1] 5.5

Created on 2018-09-11 by the reprex
package
(v0.2.0).


#13

Interesting.

However, notice strings and free-names are discernible in the presence of ":=" in the "..." region (which was the mutate() "assignment" case most of us were discussing).

f <- function(...) { match.call() }

f(x := 7)
# f(`:=`(x, 7))

f("x" := 7)
# f(`:=`("x", 7))

So dplyr, in principle in some cases, could decide if to accept such notation (strings on the left-hand-sides of :=) or not.

Exception to an "R always does this" comprehension include:

quote("X"$a)
# "X"$a

X <- list(a = 5)
X$a
# [1] 5
"X"$a
# Error in "X"$a : $ operator is invalid for atomic vectors
function("X" = 1) { X }
Error: unexpected string constant in "function("X""

#14

Oh, interesting. Just to clarify (for myself and any future browser unfamiliar with match.call(), as I was), R gets rid of the ""* in the absence of := like so:

f <- function(...) { match.call() }
f("mean"(x))
#> f(mean(x))
f(mean("x" = x))
#> f(mean(x = x))

f(mean("x" := x))
#> f(mean(`:=`("x", x)))
f(mean(x := x))
#> f(mean(`:=`(x, x)))

Created on 2018-09-11 by the reprex package (v0.2.0.9000)

* distinguishes between free-names and strings


#15

Yes. The match.call() is quoting f() and arguments (into language objects, not character). So one can also write those examples as:

quote("mean"(x))
# mean(x)
quote(mean("x" = x))
# mean(x = x)

quote(mean("x" := x))
# mean(`:=`("x", x))
quote(mean(x := x))
# mean(`:=`(x, x))

A case of this sort of thing I had to deal with in the wrapr::let() implementation is (notes here):

quote(d$"X")
# d$X

Though it is worth repeating that "$" isn't symmetric.

quote("X"$a)
# "X"$a

X <- list(a = 5)
X$a
# [1] 5
"X"$a
# Error in "X"$a : $ operator is invalid for atomic vectors

So it is something I have thought a bit about, it just wasn't on the top of my mind (or worth the length) in my first note.

The one that gave me chills is when Gabe Becker taught me the following:

quote(7 -> x)
# x <- 7