When should I use `all_of(vars)` instead of just `vars`?

junghoonshin · June 27, 2021, 2:54am

Let me start with an example.

library(tidyverse)

my_tbl = tibble(a = 1, b = 1, c = 1)

vars = c("a", "b")

my_tbl %>% select(vars)
## A tibble: 1 x 2
#      a     b
#  <dbl> <dbl>
#1     1     1

my_tbl %>% select(all_of(vars))
## A tibble: 1 x 2
#      a     b
#  <dbl> <dbl>
#1     1     1

vars = c("a", "b", "d")

my_tbl %>% select(vars)
#Error: Can't subset columns that don't exist.
#x Column `d` doesn't exist.
#Run `rlang::last_error()` to see where the error occurred.

my_tbl %>% select(all_of(vars))
#Error: Can't subset columns that don't exist.
#x Column `d` doesn't exist.
#Run `rlang::last_error()` to see where the error occurred.

As shown above, using select(vars) and select(all_of(vars)) result in the same outputs. So when should I use select(all_of(vars)) instead of just select(vars)?

vkatti · June 27, 2021, 3:59am

If you're trying to select a column that doesn't exist, it will throw an error in both cases as shown in your example.

If you want to silently ignore the missing columns, use any_of() instead of all_of().

The latter is supposed to be safer choice here. Both all_of()and any_of() should be used when vars is a character vector of variable names.

junghoonshin · June 27, 2021, 5:29am

Thank you for the explanation, but actually, that doesn't answer my question.

Both all_of() and any_of() should be used when vars is a character vector of variable names.

Why should all_of(vars) be used instead of just vars where vars is a character vector of variable names? It seems to me that either of them does the same thing as shown in the example.

paulMT · July 12, 2021, 4:03pm

With tidyverse functions it is not necessary to use quotes to refer to variables. The consequence however is that ambiguity can arise when you specify a character variable.

An extreme example to make this clear.

library(dplyr)
Sepal.Length <- "Sepal.Width"

iris %>% select(Sepal.Length) %>% head(3)
#>   Sepal.Length
#> 1          5.1
#> 2          4.9
#> 3          4.7

iris %>% select(all_of(Sepal.Length)) %>% head(3)
#>   Sepal.Width
#> 1         3.5
#> 2         3.0
#> 3         3.2

For this reason it is better to use systematically any_of() or all_of() when working with character variables.

system · July 19, 2021, 4:04pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.