When using the
separate() function from
tidyr with colleagues who were new to the tidyverse (and R), I tried to explain why its arguments are provided the way the way and became curious about when non-standard evaluation should be used (in functions) and why.
tidyr::separate(), for example, the column to be separated (the argument
col) is provided without quotations, whereas the columns the column to be separated into are provided in a character vector:
library(tidyr) library(dplyr, warn.conflicts=F) df <- data.frame(x = c(NA, "a.b", "a.d", "b.c")) df #> x #> 1 <NA> #> 2 a.b #> 3 a.d #> 4 b.c df %>% separate(x, c("A", "B")) #> A B #> 1 <NA> <NA> #> 2 a b #> 3 a d #> 4 b c
I don’t think this is idiosyncratic only to
separate(), though maybe it is and there is a unique reason why.
I thought one reason may be that the column to be separated exists in the data frame, whereas the columns that are to be separated into new columns do not exist (yet), and so that may be why the new column names are provided in a vector. However, for other functions, like
dplyr, the new names for the new variables / columns are provided without quotations, i.e.
dplyr::mutate(iris, Sepal.Area = Sepal.Length * Sepal.Width).
I ask in part out of curiosity and also because I would like to be consistent with use of non-standard evaluation by others and its use in tidyverse packages. I also ask because while there are good discussions and resources around the why of non-standard evaluation (via
tidyeval) and the how, I am less familiar with tips on the when.
Thank you for your pointers or feedback.