I'd love any advice folks have about this little reprex below. The general goal here is take quoted or unquoted input from a function (called f here). The function f replaces a simple group_by/summarize() process. These are a simplification of a more complicated workflow, meant to isolate an issue I've run into.
This process works well (see the first two tests below) for both unquoted and quoted inputs. It falls apart, however, when a string is stored in a value (x) in this case, and that value is used when calling f.
# dependencies
suppressMessages(library(dplyr))
library(ggplot2)
# simplified function
f <- function(.data, group, value){
# save parameters to list
paramList <- as.list(match.call())
# nse
if (!is.character(paramList$group)) {
groupQ <- rlang::enquo(group)
} else if (is.character(paramList$group)) {
groupQ <- rlang::quo(!! rlang::sym(group))
}
if (!is.character(paramList$value)) {
valueQ <- rlang::enquo(value)
} else if (is.character(paramList$value)) {
valueQ <- rlang::quo(!! rlang::sym(value))
}
# group and summarize
.data %>%
dplyr::group_by(!!groupQ) %>%
dplyr::summarize(sum = base::sum(!!valueQ)) -> out
# return output
return(out)
}
# test unquoted input
mpg %>%
f(group = class, value = hwy)
#> # A tibble: 7 x 2
#> class sum
#> <chr> <int>
#> 1 2seater 124
#> 2 compact 1330
#> 3 midsize 1119
#> 4 minivan 246
#> 5 pickup 557
#> 6 subcompact 985
#> 7 suv 1124
# test quoted input
mpg %>%
f(group = "class", value = "hwy")
#> # A tibble: 7 x 2
#> class sum
#> <chr> <int>
#> 1 2seater 124
#> 2 compact 1330
#> 3 midsize 1119
#> 4 minivan 246
#> 5 pickup 557
#> 6 subcompact 985
#> 7 suv 1124
# test input via a stored value
x <- "class"
mpg %>%
f(group = x, value = "hwy")
#> Error in grouped_df_impl(data, unname(vars), drop): Column `x` is unknown
What I find interesting is that this approach works with other dplyr functions, like select. When I create a simple function g that can take quoted or unquoted input, it works with both as well as when the variable name is stored in a value that is supplied for the appropriate argument.
# dependencies
suppressMessages(library(dplyr))
library(ggplot2)
# simplified function
g <- function(.data, value){
# save parameters to list
paramList <- as.list(match.call())
# nse
if (!is.character(paramList$value)) {
valueQ <- rlang::enquo(value)
} else if (is.character(paramList$value)) {
valueQ <- rlang::quo(!! rlang::sym(value))
}
# group and summarize
.data %>%
dplyr::select(!!valueQ) -> out
# return output
return(out)
}
# test unquoted input
mpg %>%
g(value = hwy)
#> # A tibble: 234 x 1
#> hwy
#> <int>
#> 1 29
#> 2 29
#> 3 31
#> 4 30
#> 5 26
#> 6 26
#> 7 27
#> 8 26
#> 9 25
#> 10 28
#> # ... with 224 more rows
# test quoted input
mpg %>%
g(value = "hwy")
#> # A tibble: 234 x 1
#> hwy
#> <int>
#> 1 29
#> 2 29
#> 3 31
#> 4 30
#> 5 26
#> 6 26
#> 7 27
#> 8 26
#> 9 25
#> 10 28
#> # ... with 224 more rows
# test input via a stored value
x <- "hwy"
mpg %>%
g(value = x)
#> # A tibble: 234 x 1
#> hwy
#> <int>
#> 1 29
#> 2 29
#> 3 31
#> 4 30
#> 5 26
#> 6 26
#> 7 27
#> 8 26
#> 9 25
#> 10 28
#> # ... with 224 more rows
Ended up finding the solution (in third test at the bottom of the reprex). In order to pass the quoted value class into the function, the input needs to be turned into a quosure before being passed as an argument. Once it is in ~class form, it is passed to the function with the bang-bang (!!).
# dependencies
suppressMessages(library(dplyr))
library(ggplot2)
# simplified function
f <- function(.data, group, value){
# save parameters to list
paramList <- as.list(match.call())
# nse
if (!is.character(paramList$group)) {
groupQ <- rlang::enquo(group)
} else if (is.character(paramList$group)) {
groupQ <- rlang::quo(!! rlang::sym(group))
}
if (!is.character(paramList$value)) {
valueQ <- rlang::enquo(value)
} else if (is.character(paramList$value)) {
valueQ <- rlang::quo(!! rlang::sym(value))
}
# group and summarize
.data %>%
dplyr::group_by(!!groupQ) %>%
dplyr::summarize(sum = base::sum(!!valueQ)) -> out
# return output
return(out)
}
# test unquoted input
mpg %>%
f(group = class, value = hwy)
#> # A tibble: 7 x 2
#> class sum
#> <chr> <int>
#> 1 2seater 124
#> 2 compact 1330
#> 3 midsize 1119
#> 4 minivan 246
#> 5 pickup 557
#> 6 subcompact 985
#> 7 suv 1124
# test quoted input
mpg %>%
f(group = "class", value = "hwy")
#> # A tibble: 7 x 2
#> class sum
#> <chr> <int>
#> 1 2seater 124
#> 2 compact 1330
#> 3 midsize 1119
#> 4 minivan 246
#> 5 pickup 557
#> 6 subcompact 985
#> 7 suv 1124
# test input via a stored value
x <- "class"
xQ <- rlang::quo(!! rlang::sym(x))
mpg %>%
f(group = !!xQ, value = "hwy")
#> # A tibble: 7 x 2
#> class sum
#> <chr> <int>
#> 1 2seater 124
#> 2 compact 1330
#> 3 midsize 1119
#> 4 minivan 246
#> 5 pickup 557
#> 6 subcompact 985
#> 7 suv 1124
This is due to the way you handle nse inside your f function I think. When you provide x <- "class", your if statement will call is.character(paramList$value) that will be false so call valueQ <- rlang::enquo(group). groupQ end up being like enquo(x) and it does not work correctly with group_by. It needs to pass through your else if clause groupQ <- rlang::quo(!! rlang::sym(group)) and it is what you end up doing outside the function: xQ <- rlang::quo(!! rlang::sym(x)). So if you modify your f function you to deal with a x <- "class" it should work. When provided as character it should pass through sym or ensym.
Note that you could also use the variant group_by_at that works with character or columns name generated by vars(). It could be very useful in you case.
Here some examples to help show how NSE work here.
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
x <- "class"
# effect of tidyevalutation
rlang::qq_show(!!quo(x))
#> ^x
rlang::qq_show(quo(!!x))
#> quo("class")
rlang::qq_show(sym(x))
#> sym(x)
rlang::qq_show(!!sym(x))
#> class
rlang::qq_show(sym(!!x))
#> sym("class")
rlang::qq_show(!!x)
#> "class"
# does not work because x is not found
mpg %>%
dplyr::group_by(!!quo(x))
#> Error in grouped_df_impl(data, unname(vars), drop): Column `x` is unknown
# works because class is a symbol
mpg %>%
dplyr::group_by(!!sym(x))
#> # A tibble: 234 x 11
#> # Groups: class [7]
#> manufacturer model displ year cyl trans drv cty hwy fl cla~
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <ch>
#> 1 audi a4 1.8 1999 4 auto~ f 18 29 p com~
#> 2 audi a4 1.8 1999 4 manu~ f 21 29 p com~
#> 3 audi a4 2 2008 4 manu~ f 20 31 p com~
#> 4 audi a4 2 2008 4 auto~ f 21 30 p com~
#> 5 audi a4 2.8 1999 6 auto~ f 16 26 p com~
#> 6 audi a4 2.8 1999 6 manu~ f 18 26 p com~
#> 7 audi a4 3.1 2008 6 auto~ f 18 27 p com~
#> 8 audi a4 q~ 1.8 1999 4 manu~ 4 18 26 p com~
#> 9 audi a4 q~ 1.8 1999 4 auto~ 4 16 25 p com~
#> 10 audi a4 q~ 2 2008 4 manu~ 4 20 28 p com~
#> # ... with 224 more rows
# works because the *_at variant know how to deal with character
mpg %>%
dplyr::group_by_at(.vars = x)
#> # A tibble: 234 x 11
#> # Groups: class [7]
#> manufacturer model displ year cyl trans drv cty hwy fl cla~
#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <ch>
#> 1 audi a4 1.8 1999 4 auto~ f 18 29 p com~
#> 2 audi a4 1.8 1999 4 manu~ f 21 29 p com~
#> 3 audi a4 2 2008 4 manu~ f 20 31 p com~
#> 4 audi a4 2 2008 4 auto~ f 21 30 p com~
#> 5 audi a4 2.8 1999 6 auto~ f 16 26 p com~
#> 6 audi a4 2.8 1999 6 manu~ f 18 26 p com~
#> 7 audi a4 3.1 2008 6 auto~ f 18 27 p com~
#> 8 audi a4 q~ 1.8 1999 4 manu~ 4 18 26 p com~
#> 9 audi a4 q~ 1.8 1999 4 auto~ 4 16 25 p com~
#> 10 audi a4 q~ 2 2008 4 manu~ 4 20 28 p com~
#> # ... with 224 more rows
Thanks @cderv - this is super helpful. I haven't used the *_at functions before, and also didn't know about rlang::qq_show. Much easier than how I've been debugging quasi quotation...