This looks like a case where you'd want to use group_by
(rather than map
) to operate by group.
In the code below:*
- The
...
allows you to enter any number of grouping columns (or none) rather than just one.
- The calls to
enquo
and enquos
are how you capture unevaluated arguments in the tidyeval system. These arguments are later evaluated with the !!
for a single argument (e.g., !!value.col
) or !!!
for multiple arguments (e.g., !!!group.cols
). enquos(...)
captures all of the grouping variables as a list of quosures.
- The quantiles are generated by group within
summarise
.
- Within summarise:
-
quantile(!!value.col, probs=probs)
generates the quantiles for whatever column was entered as value.col
.
-
enframe
converts the named vector returned by quantile
to a data frame with name
and value
columns we choose.
- The whole thing is wrapped in
list
which results in a nested data frame. Then we unnest to return the final desired data frame.
quantiles_by_group = function(data, value.col, ..., probs=c(0.01, seq(.05, .95, .05), .99)) {
value.col=enquo(value.col)
group.cols=enquos(...)
data %>%
group_by(!!!group.cols) %>%
summarise(!!value.col := list(enframe(quantile(!!value.col, probs=probs), name="quantile", value=quo_text(value.col)))) %>%
unnest
}
quantiles_by_group(CO2, conc, Treatment)
quantiles_by_group(CO2, conc) # No grouping variables
quantiles_by_group(CO2, conc, Treatment, Type, Plant, probs=c(0.25, 0.75)) # Multiple grouping variables
quantiles_by_group(mtcars, mpg, cyl)
quantiles_by_group(iris, Petal.Width, Species)
When working with tidy evaluation, I usually feel like I'm walking around blindfolded, so I can't guarantee that this approach is the "right" way to do it, but at least it works.
The function can be generalized further to summarize all numeric columns using summarise_if
instead of summarise
. Note below that to extract the name of each numeric column we use quo_text(quo(.))
. I originally thought the appropriate incantation would be quo_text(enquo(.))
, but due to my limited understanding of tidyeval, I'm not sure why one works and the other doesn't.
quantiles_by_group2 = function(data, ..., probs=c(0.25, 0.75)) {
group.cols=enquos(...)
data %>%
group_by(!!!group.cols) %>%
# Get quantiles for all numeric columns
summarise_if(is.numeric,
funs(
list(
enframe(
quantile(., probs=probs),
name="quantile",
value=quo_text(quo(.))
)
)
)
) %>%
unnest %>%
# Remove the repeated quantile columns
select(-matches("quantile."))
}
quantiles_by_group2(iris, Species)
Species quantile Sepal.Length Sepal.Width Petal.Length Petal.Width
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 setosa 25% 4.8 3.2 1.4 0.2
2 setosa 75% 5.2 3.68 1.58 0.3
3 versicolor 25% 5.6 2.52 4 1.2
4 versicolor 75% 6.3 3 4.6 1.5
5 virginica 25% 6.22 2.8 5.1 1.8
6 virginica 75% 6.9 3.18 5.88 2.3
* Which I've adapted from an answer I wrote on Stack Overflow a while back.