Hello guys.
I am a bit lost here and cannot find the solution.
Please see the code below.
I have a data frame and I want to select specific columns.
Naturally this is dummy code, but in my actual problem, I get the variables from an operation, and they get stored in a variable.
How do I make the data frame, "fastDummies_example", return only the columns found in the variable "variables_I_want"
vinaychuri's solution will work. However, I would recommend using all_of() when subsetting a data frame using variable names stored as strings. This is more robust and avoids problems such as unexpected data masking.
Sure Anirban. I'll post here as the OP may find it useful.
The issue is that data variables always have priority and can end up masking environment variables if they have the same name. Consider the example below.
my_mtcars <- mtcars[1:4, ]
vars <- c("cyl", "am", "vs")
# This works (with a note).
dplyr::select(my_mtcars, vars)
#> Note: Using an external vector in selections is ambiguous.
#> i Use `all_of(vars)` instead of `vars` to silence this message.
#> i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> cyl am vs
#> Mazda RX4 6 1 0
#> Mazda RX4 Wag 6 1 0
#> Datsun 710 4 1 1
#> Hornet 4 Drive 6 0 1
# But let's say my_mtcars contains a column named vars.
my_mtcars$vars <- 1:4
# This gives a different result now because the data variable vars masks the
# environment variable vars.
dplyr::select(my_mtcars, vars)
#> vars
#> Mazda RX4 1
#> Mazda RX4 Wag 2
#> Datsun 710 3
#> Hornet 4 Drive 4
# To disambiguate and force the environment variable, use all_of().
dplyr::select(my_mtcars, all_of(vars))
#> cyl am vs
#> Mazda RX4 6 1 0
#> Mazda RX4 Wag 6 1 0
#> Datsun 710 4 1 1
#> Hornet 4 Drive 6 0 1
This ambiguity is usually not a problem in interactive data analysis when you know what variables your data contains but it is very relevant for package development since you have no idea what variables will be present in the data.
The tidyverse maintainers have indicated that this approach of supplying strings to selections without explicitly specifying whether you are referring to data or environment variables will be deprecated at some point in the future, so it is a good idea to start using all_of() in these situations.