@aosmith has the right idea here for the current version of tidyselect, but I thought this was a nice question to take a moment and point out that this is going to be changing a little bit (hopefully for the better!) in the next version of tidyselect.
First, I think @aosmith's solution can be simplified a little bit like this. Since the dots aren't needed elsewhere, we can just pass them straight through to vars_select()
without defusing them with vars()
first.
# devtools::install_github("r-lib/tidyselect")
library(tidyselect)
library(rlang)
var_names <- function(.data, ...) {
vars_select(names(.data), ...)
}
var_names(mtcars, mpg, "cyl")
#> mpg cyl
#> "mpg" "cyl"
It was also mentioned that you wanted to match against a variable holding a character vector like this
am_disp <- c("am", "disp")
This is considered ambiguous in the new version of tidyselect. Is this a column in mtcars named am_disp
? Or is this a variable that tidyselect needs to evaluate? Because of this, you will now get this message:
var_names(mtcars, mpg, "cyl", am_disp)
#> Note: Using an external vector in selections is brittle.
#> ℹ If the data contains `am_disp` it will be selected instead.
#> ℹ Use `all_of(am_disp)` instead of `am_disp` to silence this message.
#> This message is displayed once per session.
#> mpg cyl am disp
#> "mpg" "cyl" "am" "disp"
Instead you should use the new all_of()
(which supersedes one_of()
) to tell tidyselect this is a variable you want to evaluate.
var_names(mtcars, mpg, "cyl", all_of(am_disp))
#> mpg cyl am disp
#> "mpg" "cyl" "am" "disp"
Now, vars_select()
is in the questioning stage for this new tidyselect version. It won't be going away any time soon, but there is a new solution to this kind of problem using a new function, eval_select()
. This function takes an expression holding the variable selection you care about, and a data
argument which tells tidyselect where to "look up" those variables. It returns a vector of positions of where to find the variables in data
, and the names are the column names. It works somewhat like this.
cols_expr <- expr(c(mpg, cyl, "disp"))
eval_select(cols_expr, mtcars)
#> mpg cyl disp
#> 1 2 3
Notice how we wrap the 3 variables in c()
in the expr()
call to bundle them together. Now we can build var_names()
with eval_select()
using the same pattern. We just bundle the names passed in the ...
with c()
. I'll call this one eval_names()
.
eval_names <- function(.data, ...) {
expr <- rlang::expr(c(...))
eval_select(expr, data = .data)
}
eval_names(mtcars, mpg, "cyl", all_of(am_disp))
#> mpg cyl am disp
#> 1 2 9 3
And return_vector()
is easy to build on that.
return_vector <- function(.data, ...) {
positions <- eval_names(.data, ...)
paste0(names(positions), collapse = ", ")
}
return_vector(mtcars, mpg, "cyl", all_of(am_disp))
#> [1] "mpg, cyl, am, disp"
Lastly, there are really two selection syntaxes that you can use with tidyselect. One is by specifying the names in the ...
like how we have done here, and like how dplyr::select()
does. The other is to specify names in a single variable, like in tidyr::pivot_longer(data = mtcars, cols = c(vs, cyl))
. We can build a version of eval_names()
that works that way too. To do that, you first enquo()
the cols
to defuse it, preventing cols
from trying to immediately evaluate and try to "find" your variables too early. That can be directly passed on to eval_select()
.
eval_names2 <- function(.data, cols) {
cols <- rlang::enquo(cols)
eval_select(cols, data = .data)
}
eval_names2(mtcars, c(mpg, "cyl", all_of(am_disp)))
#> mpg cyl am disp
#> 1 2 9 3
From there you could easily wrap eval_names()
and eval_names2()
to (mostly) mimic what dplyr::select()
does. With select_from_eval_names()
we don't have to do anything special because the ...
can just be passed all the way through down to eval_select()
through eval_names()
. With select_from_eval_names2()
, we do have to add the extra step of defusing the cols
argument with enquo()
to keep it from trying to look up c(mpg, "cyl")
immediately, and then pass it through to eval_names2()
with !!
.
select_from_eval_names <- function(.data, ...) {
positions <- eval_names(.data, ...)
.data[positions]
}
select_from_eval_names2 <- function(.data, cols) {
cols <- rlang::enquo(cols)
positions <- eval_names2(.data, !!cols)
.data[positions]
}
mtcars_small <- mtcars[1:3,]
select_from_eval_names(mtcars_small, mpg, "cyl")
#> mpg cyl
#> Mazda RX4 21.0 6
#> Mazda RX4 Wag 21.0 6
#> Datsun 710 22.8 4
select_from_eval_names2(mtcars_small, c(mpg, "cyl"))
#> mpg cyl
#> Mazda RX4 21.0 6
#> Mazda RX4 Wag 21.0 6
#> Datsun 710 22.8 4
To learn more about this, Lionel has written up a great new tidyselect vignette describing these ideas in even more detail! https://tidyselect.r-lib.org/articles/tidyselect.html#the-selection-evaluators