Passing in column name as a function parameter into map function

I'm new to packages like purrr and rlang. I'm trying to write a function that takes a dataset, a column name to group by, and a column name to get quantiles for. This is what I have so far

library(tidyverse)

create_quantile_dfs <- function(data, group_col, metric_col, quantile_vector = c(0.01, seq(.05, .95, .05), .99)) {

  # get the number of groups
  num_variants <- data %>% select(!! group_col) %>% unique() %>% length()
  
  df_quantiles <- data %>%
    # a sort of groupby that allows functional programming on the other column???
    nest(- !!group_col) %>%
    # get the quantiles then revert back to the dataframe we're used to
    mutate(quantiles = map(data, ~ quantile(.$conc, na.rm=TRUE,
                                            probs = quantile_vector),
                           quantiles = map(quantiles, ~ bind_rows(.) %>% gather()))) %>%
    unnest(quantiles)
  
  # label quantile values with the tau value
  quantile_key <- as.character(quantile_vector)
  df_quantiles$quantile_key <- rep(quantile_key, num_variants)
  
  return(df_quantiles)
}

create_quantile_dfs(CO2, quo(Treatment), quo(conc))

But I can't find a way to get rid of the explicit column name in map (map(data, ~ quantile(.$conc). I'd like to use the function parameter metric_col instead. I don't understand using quo and !! under the hood to see why it won't play well with .$. Please help if possible and ty!

This looks like a case where you'd want to use group_by (rather than map) to operate by group.

In the code below:*

  • The ... allows you to enter any number of grouping columns (or none) rather than just one.
  • The calls to enquo and enquos are how you capture unevaluated arguments in the tidyeval system. These arguments are later evaluated with the !! for a single argument (e.g., !!value.col) or !!! for multiple arguments (e.g., !!!group.cols). enquos(...) captures all of the grouping variables as a list of quosures.
  • The quantiles are generated by group within summarise.
  • Within summarise:
    • quantile(!!value.col, probs=probs) generates the quantiles for whatever column was entered as value.col.
    • enframe converts the named vector returned by quantile to a data frame with name and value columns we choose.
    • The whole thing is wrapped in list which results in a nested data frame. Then we unnest to return the final desired data frame.
quantiles_by_group = function(data, value.col, ..., probs=c(0.01, seq(.05, .95, .05), .99)) {
  
  value.col=enquo(value.col)
  group.cols=enquos(...)
  
  data %>% 
    group_by(!!!group.cols) %>% 
    summarise(!!value.col := list(enframe(quantile(!!value.col, probs=probs), name="quantile", value=quo_text(value.col)))) %>% 
    unnest
}

quantiles_by_group(CO2, conc, Treatment)
quantiles_by_group(CO2, conc) # No grouping variables
quantiles_by_group(CO2, conc, Treatment, Type, Plant, probs=c(0.25, 0.75)) # Multiple grouping variables
quantiles_by_group(mtcars, mpg, cyl)
quantiles_by_group(iris, Petal.Width, Species)

When working with tidy evaluation, I usually feel like I'm walking around blindfolded, so I can't guarantee that this approach is the "right" way to do it, but at least it works.

The function can be generalized further to summarize all numeric columns using summarise_if instead of summarise. Note below that to extract the name of each numeric column we use quo_text(quo(.)). I originally thought the appropriate incantation would be quo_text(enquo(.)), but due to my limited understanding of tidyeval, I'm not sure why one works and the other doesn't.

quantiles_by_group2 = function(data, ..., probs=c(0.25, 0.75)) {
  
  group.cols=enquos(...)
  
  data %>% 
    group_by(!!!group.cols) %>% 
    # Get quantiles for all numeric columns
    summarise_if(is.numeric, 
                 funs(
                   list(
                     enframe(
                       quantile(., probs=probs), 
                       name="quantile", 
                       value=quo_text(quo(.))
                     )
                   )
                 )
    ) %>% 
    unnest %>% 
    # Remove the repeated quantile columns
    select(-matches("quantile."))
}
quantiles_by_group2(iris, Species)

  Species    quantile Sepal.Length Sepal.Width Petal.Length Petal.Width
  <fct>      <chr>           <dbl>       <dbl>        <dbl>       <dbl>
1 setosa     25%              4.8         3.2          1.4          0.2
2 setosa     75%              5.2         3.68         1.58         0.3
3 versicolor 25%              5.6         2.52         4            1.2
4 versicolor 75%              6.3         3            4.6          1.5
5 virginica  25%              6.22        2.8          5.1          1.8
6 virginica  75%              6.9         3.18         5.88         2.3

* Which I've adapted from an answer I wrote on Stack Overflow a while back.

4 Likes

Awesome! I love how you explained the different parts of what your function does. I need to read up on these packages to feel more comfortable going forward haha... thanks again for your help!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.