I'd love any advice folks have about this little reprex below. The general goal here is take quoted or unquoted input from a function (called f
here). The function f
replaces a simple group_by
/summarize()
process. These are a simplification of a more complicated workflow, meant to isolate an issue I've run into.
This process works well (see the first two tests below) for both unquoted and quoted inputs. It falls apart, however, when a string is stored in a value (x
) in this case, and that value is used when calling f
.
# dependencies
suppressMessages(library(dplyr))
library(ggplot2)
# simplified function
f <- function(.data, group, value){
# save parameters to list
paramList <- as.list(match.call())
# nse
if (!is.character(paramList$group)) {
groupQ <- rlang::enquo(group)
} else if (is.character(paramList$group)) {
groupQ <- rlang::quo(!! rlang::sym(group))
}
if (!is.character(paramList$value)) {
valueQ <- rlang::enquo(value)
} else if (is.character(paramList$value)) {
valueQ <- rlang::quo(!! rlang::sym(value))
}
# group and summarize
.data %>%
dplyr::group_by(!!groupQ) %>%
dplyr::summarize(sum = base::sum(!!valueQ)) -> out
# return output
return(out)
}
# test unquoted input
mpg %>%
f(group = class, value = hwy)
#> # A tibble: 7 x 2
#> class sum
#> <chr> <int>
#> 1 2seater 124
#> 2 compact 1330
#> 3 midsize 1119
#> 4 minivan 246
#> 5 pickup 557
#> 6 subcompact 985
#> 7 suv 1124
# test quoted input
mpg %>%
f(group = "class", value = "hwy")
#> # A tibble: 7 x 2
#> class sum
#> <chr> <int>
#> 1 2seater 124
#> 2 compact 1330
#> 3 midsize 1119
#> 4 minivan 246
#> 5 pickup 557
#> 6 subcompact 985
#> 7 suv 1124
# test input via a stored value
x <- "class"
mpg %>%
f(group = x, value = "hwy")
#> Error in grouped_df_impl(data, unname(vars), drop): Column `x` is unknown
Created on 2018-12-06 by the reprex package (v0.2.1)
What I find interesting is that this approach works with other dplyr
functions, like select
. When I create a simple function g
that can take quoted or unquoted input, it works with both as well as when the variable name is stored in a value that is supplied for the appropriate argument.
# dependencies
suppressMessages(library(dplyr))
library(ggplot2)
# simplified function
g <- function(.data, value){
# save parameters to list
paramList <- as.list(match.call())
# nse
if (!is.character(paramList$value)) {
valueQ <- rlang::enquo(value)
} else if (is.character(paramList$value)) {
valueQ <- rlang::quo(!! rlang::sym(value))
}
# group and summarize
.data %>%
dplyr::select(!!valueQ) -> out
# return output
return(out)
}
# test unquoted input
mpg %>%
g(value = hwy)
#> # A tibble: 234 x 1
#> hwy
#> <int>
#> 1 29
#> 2 29
#> 3 31
#> 4 30
#> 5 26
#> 6 26
#> 7 27
#> 8 26
#> 9 25
#> 10 28
#> # ... with 224 more rows
# test quoted input
mpg %>%
g(value = "hwy")
#> # A tibble: 234 x 1
#> hwy
#> <int>
#> 1 29
#> 2 29
#> 3 31
#> 4 30
#> 5 26
#> 6 26
#> 7 27
#> 8 26
#> 9 25
#> 10 28
#> # ... with 224 more rows
# test input via a stored value
x <- "hwy"
mpg %>%
g(value = x)
#> # A tibble: 234 x 1
#> hwy
#> <int>
#> 1 29
#> 2 29
#> 3 31
#> 4 30
#> 5 26
#> 6 26
#> 7 27
#> 8 26
#> 9 25
#> 10 28
#> # ... with 224 more rows
Created on 2018-12-06 by the reprex package (v0.2.1)
I'd love to know what is going on here and what I am missing with group_by()
. Any help would be very much appreciated!