Evaluating `...` using {rlang} when supplying a vector

boshek · November 13, 2019, 7:48pm

Hi all,

I have gotten myself confused with {rlang} and I was hoping someone could help me. If I have this toy function that I want to return a vector:

return_vector <- function(...){
  dots <- rlang::exprs(...)
  paste0(dots, collapse = ", ")
}

If I supply two inputs, it works fine:

return_vector(foo, bar)
#> [1] "foo, bar"

But if I assign to a vector I can't seem to evaluate the vector:

foobar <- c("foo","bar")
return_vector(foobar)
#> [1] "foobar"

If I use !! I can evaluate it:

return_vector(!!foobar)
#> [1] "c(\"foo\", \"bar\")"

But really I want to do that inside the function. I tried this but it didn't seem to work:

return_vector2 <- function(...){
  dots <- rlang::exprs(...)
  paste0(!!dots, collapse = ", ")
}

return_vector2(foobar)
#> Error in !dots: invalid argument type

^{Created on 2019-11-13 by the reprex package (v0.3.0)}

I feel like I must be missing something but if anyone has any input, it would be much appreciated.

mishabalyasin · November 13, 2019, 8:01pm

paste0 takes dots already, so you don't really need to use rlang:

foobar <- c("foo","bar")
return_vector2 <- function(...){
  paste0(..., collapse = ", ")
}

return_vector2(foobar) 
#> [1] "foo, bar"

^{Created on 2019-11-13 by the reprex package (v0.3.0)}

What is the actual problem you are trying to solve?

boshek · November 13, 2019, 8:18pm

Indeed that does work for supplying a vector but it fails when supplying bare variable names:

> return_vector2 <- function(...){
   paste0(..., collapse = ", ")
 }
> return_vector2(foo, bar)
Error in paste0(..., collapse = ", ") : object 'bar' not found

What is the actual problem you are trying to solve?

I am writing a select method for a package so I am actually just trying to replicate what {dplyr} does:


library(dplyr, warn.conflicts = FALSE)

starwars %>% 
  select(name, height)
#> # A tibble: 87 x 2
#>    name               height
#>    <chr>               <int>
#>  1 Luke Skywalker        172
#>  2 C-3PO                 167
#>  3 R2-D2                  96
#>  4 Darth Vader           202
#>  5 Leia Organa           150
#>  6 Owen Lars             178
#>  7 Beru Whitesun lars    165
#>  8 R5-D4                  97
#>  9 Biggs Darklighter     183
#> 10 Obi-Wan Kenobi        182
#> # ... with 77 more rows

nh <- c("name", "height")

starwars %>% 
  select(nh)
#> # A tibble: 87 x 2
#>    name               height
#>    <chr>               <int>
#>  1 Luke Skywalker        172
#>  2 C-3PO                 167
#>  3 R2-D2                  96
#>  4 Darth Vader           202
#>  5 Leia Organa           150
#>  6 Owen Lars             178
#>  7 Beru Whitesun lars    165
#>  8 R5-D4                  97
#>  9 Biggs Darklighter     183
#> 10 Obi-Wan Kenobi        182
#> # ... with 77 more rows

^{Created on 2019-11-13 by the reprex package (v0.3.0)}

raytong · November 15, 2019, 6:06am

Hi @boshek. To solve the three situations that you mentioned at the first post, I suggest the following code. rlang::exprs will return a list, so you may unlist it to vector and paste together. Hope it can help.

library(tidyverse)

return_vector <- function(...){
  dots <- rlang::exprs(...)
  paste(unlist(dots), collapse = ", ")
}

foobar <- c("foo","bar")
return_vector(foo, bar)
#> [1] "foo, bar"
return_vector("foo", "bar")
#> [1] "foo, bar"
return_vector(!!foobar)
#> [1] "foo, bar"

^{Created on 2019-11-15 by the reprex package (v0.3.0)}

boshek · November 15, 2019, 7:56pm

Thanks for taking a stab @raytong. I realize I can evaluate the variable before I supply it to the function. However I want to mimic select's behaviour and evaluate it internally a la:

library(dplyr)

nh <- c("name", "height")

starwars %>% 
  select(nh)

joels · November 15, 2019, 8:38pm

How does paste figure in what you're trying to do, or was that just for illustration? I'm trying to understand your actual use case.

If you want to give people the option of selecting columns either with bare column names or with strings, does select_at do what you need?

starwars %>% select_at(vars(name, height))
starwars %>% select_at(vars("name", "height"))

You can also create a function that's relatively flexible about the nature of the ... argument, and without the need for quosures. For example:

sel_fnc = function(data, ...) {
  data %>% 
    select_at(vars(...))
}

starwars %>% 
  sel_fnc("height", name)

starwars %>% 
  sel_fnc(height, name)

starwars %>% 
  sel_fnc(c(height, name))

starwars %>% 
  sel_fnc(c(height, name), c("skin_color", "mass"), c("homeworld", species))

boshek · November 15, 2019, 9:49pm

Thanks @joels

Hmm... For some reason I am not articulating myself very well.

So I will try to take a deeper dive. I am working on a package that sends CQL queries to a web feature service. The API let's you specify which columns you'd like returned (along with filtering) via those queries. That open up a path to creating a dplyr-like syntax to make those requests. We've made something that is very much like what happens in dbplyr which lazily construct a query that is ultimately sent by collect. Here is the method as it is currently implemented:

github.com

bcgov/bcdata/blob/d3af43b5b0ac0a198041aaa531c9604b84896f68/R/utils-classes.R#L221-L231


select.bcdc_promise <- function(.data, ...){
  dots <- rlang::exprs(...)


  ## Always add back in the geom
  cols_to_select <- paste(geom_col_name(.data$cols_df), paste0(dots, collapse = ","), sep = ",")


  query_list <- c(.data$query_list, propertyName = cols_to_select)


  as.bcdc_promise(list(query_list = query_list, cli = .data$cli,
                       record = .data$record, cols_df = .data$cols_df))
}

The works fine is you supply bare variable name. The variable names are turned into a single string. The problem is that if you supply an object that is a vector (foobar in my example), I get lost in rlang-world. To construct the api call, I need to turn both bare variables AND objects that are vectors into strings. So that is if the input for ... is a vector I need to evaluate it inside the function. If it is not, then I just pass those variables (after using rlang::exprs) to paste to create the string for the api call. So this is approximately what the function would look like:

return_vector <- function(...){
  
  dots <- rlang::exprs(...)
  # Some code that evaluates dots if it was a object if not
  # the names just get passed to paste
  paste0(dots, collapse = ", ")
}

dplyr manages to these things both at the same time and knows which to evaluate and which to directly use as a "selecting" variable. e.g.:

library(dplyr)
nh <- c("name", "height")

starwars %>% 
  select(nh, mass)

I think my toy example should be sufficient for illustration of my problem (ie not asking you to solves my issues) but here is the original issue for context: https://github.com/bcgov/bcdata/issues/131

aosmith · November 15, 2019, 10:20pm

I think you may be able to build off @joels idea, basing your function code roughly off of that used in select_at().

I dove in to look at the code for select_at(), and ended up finding a possible way forward for your problem in dplyr:::tbl_at_vars(). Using tidyselect::vars_select() with vars(), you can convert all variables, bare or strings or vectors, in ... to a character vector.

However, this is based on having the names of all variables as a starting point. Since it looks like your "real" function has a .data argument this might be a useful approach.

First, an example showing how we can get the variables as a string of characters no matter how we pass variable names to .... I used the mtcars variables to demonstrate.

library(dplyr)

var_names = function(.data, ...) {
     allvars = names(.data)
     tidyselect::vars_select(allvars,  !!!vars(...) )
}

var_names(mtcars, mpg, "cyl", c("am", "disp") )
#>    mpg    cyl     am   disp 
#>  "mpg"  "cyl"   "am" "disp"

Then an example of how this could look like with your original paste() example.

return_vector = function(.data, ...){
     allvars = names(.data)
     vars = tidyselect::vars_select(allvars, !!!vars(...) )
     paste0(vars, collapse = ", ")
}

return_vector(mtcars, "cyl", mpg, c("am", "disp") )
#> [1] "cyl, mpg, am, disp"

Created on 2019-11-15 by the reprex package(v0.2.0).

boshek · November 15, 2019, 11:36pm

Yep! That's totally it.

davis · November 16, 2019, 1:11pm

@aosmith has the right idea here for the current version of tidyselect, but I thought this was a nice question to take a moment and point out that this is going to be changing a little bit (hopefully for the better!) in the next version of tidyselect.

First, I think @aosmith's solution can be simplified a little bit like this. Since the dots aren't needed elsewhere, we can just pass them straight through to vars_select() without defusing them with vars() first.

# devtools::install_github("r-lib/tidyselect")

library(tidyselect)
library(rlang)

var_names <- function(.data, ...) {
  vars_select(names(.data), ...)
}

var_names(mtcars, mpg, "cyl")
#>   mpg   cyl 
#> "mpg" "cyl"

It was also mentioned that you wanted to match against a variable holding a character vector like this

am_disp <- c("am", "disp")

This is considered ambiguous in the new version of tidyselect. Is this a column in mtcars named am_disp? Or is this a variable that tidyselect needs to evaluate? Because of this, you will now get this message:

var_names(mtcars, mpg, "cyl", am_disp)
#> Note: Using an external vector in selections is brittle.
#> ℹ If the data contains `am_disp` it will be selected instead.
#> ℹ Use `all_of(am_disp)` instead of `am_disp` to silence this message.
#> This message is displayed once per session.
#>    mpg    cyl     am   disp 
#>  "mpg"  "cyl"   "am" "disp"

Instead you should use the new all_of() (which supersedes one_of()) to tell tidyselect this is a variable you want to evaluate.

var_names(mtcars, mpg, "cyl", all_of(am_disp))
#>    mpg    cyl     am   disp 
#>  "mpg"  "cyl"   "am" "disp"

Now, vars_select() is in the questioning stage for this new tidyselect version. It won't be going away any time soon, but there is a new solution to this kind of problem using a new function, eval_select(). This function takes an expression holding the variable selection you care about, and a data argument which tells tidyselect where to "look up" those variables. It returns a vector of positions of where to find the variables in data, and the names are the column names. It works somewhat like this.

cols_expr <- expr(c(mpg, cyl, "disp"))
eval_select(cols_expr, mtcars)
#>  mpg  cyl disp 
#>    1    2    3

Notice how we wrap the 3 variables in c() in the expr() call to bundle them together. Now we can build var_names() with eval_select() using the same pattern. We just bundle the names passed in the ... with c(). I'll call this one eval_names().

eval_names <- function(.data, ...) {
  expr <- rlang::expr(c(...))
  eval_select(expr, data = .data)
}

eval_names(mtcars, mpg, "cyl", all_of(am_disp))
#>  mpg  cyl   am disp 
#>    1    2    9    3

And return_vector() is easy to build on that.

return_vector <- function(.data, ...) {
  positions <- eval_names(.data, ...)
  paste0(names(positions), collapse = ", ")
}

return_vector(mtcars, mpg, "cyl", all_of(am_disp))
#> [1] "mpg, cyl, am, disp"

Lastly, there are really two selection syntaxes that you can use with tidyselect. One is by specifying the names in the ... like how we have done here, and like how dplyr::select() does. The other is to specify names in a single variable, like in tidyr::pivot_longer(data = mtcars, cols = c(vs, cyl)). We can build a version of eval_names() that works that way too. To do that, you first enquo() the cols to defuse it, preventing cols from trying to immediately evaluate and try to "find" your variables too early. That can be directly passed on to eval_select().

eval_names2 <- function(.data, cols) {
  cols <- rlang::enquo(cols)
  eval_select(cols, data = .data)
}

eval_names2(mtcars, c(mpg, "cyl", all_of(am_disp)))
#>  mpg  cyl   am disp 
#>    1    2    9    3

From there you could easily wrap eval_names() and eval_names2() to (mostly) mimic what dplyr::select() does. With select_from_eval_names() we don't have to do anything special because the ... can just be passed all the way through down to eval_select() through eval_names(). With select_from_eval_names2(), we do have to add the extra step of defusing the cols argument with enquo() to keep it from trying to look up c(mpg, "cyl") immediately, and then pass it through to eval_names2() with !!.

select_from_eval_names <- function(.data, ...) {
  positions <- eval_names(.data, ...)
  .data[positions]
}

select_from_eval_names2 <- function(.data, cols) {
  cols <- rlang::enquo(cols)
  positions <- eval_names2(.data, !!cols)
  .data[positions]
}

mtcars_small <- mtcars[1:3,]

select_from_eval_names(mtcars_small, mpg, "cyl")
#>                mpg cyl
#> Mazda RX4     21.0   6
#> Mazda RX4 Wag 21.0   6
#> Datsun 710    22.8   4

select_from_eval_names2(mtcars_small, c(mpg, "cyl"))
#>                mpg cyl
#> Mazda RX4     21.0   6
#> Mazda RX4 Wag 21.0   6
#> Datsun 710    22.8   4

To learn more about this, Lionel has written up a great new tidyselect vignette describing these ideas in even more detail! https://tidyselect.r-lib.org/articles/tidyselect.html#the-selection-evaluators

boshek · November 18, 2019, 6:41pm

@davis

Thanks for this super clear breakdown. I am working on this for a CRAN submission. Any rough timing for the next version of tidyselect that includes eval_select to appear on CRAN? Love that message about using an external vector in selections. Really helpful.

lionel · November 19, 2019, 12:07pm

If you just need the names of a selection, I recommend wrapping dplyr::select()

vars_dots <- function(.data, ...) {
  names(dplyr::select(.data, ...))
}

vars_arg <- function(data, arg) {
  names(dplyr::select(data, {{ arg }}))
}

This way you don't need tidyselect and don't have to worry about the next version

system · November 26, 2019, 12:07pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.