Applying enquo functions to multiple columns of a dataframe

Hello!

I am trying to streamline some dplyr code where I have repeated a set of calculations on a long list of variables in a dataset. This is my first reprex, so hopefully it conveys where I am at using the iris dataset.

library(tidyverse)

get_max_min <- function(df,feature) {
  
  feature <- enquo(feature)
  feature_name <- quo_name(feature)
  
  df %>% 
    group_by(Species) %>% 
    summarize(!! paste0("max_", feature_name) := max(!! feature),
              !! paste0("min_", feature_name) := min(!! feature))
  
}

iris %>% get_max_min(Sepal.Width)
#> # A tibble: 3 x 3
#>   Species    max_Sepal.Width min_Sepal.Width
#>   <fct>                <dbl>           <dbl>
#> 1 setosa                 4.4             2.3
#> 2 versicolor             3.4             2  
#> 3 virginica              3.8             2.2

I would now like to run get_max_min on each of the variables in iris (that is not species) and combine everything into one table grouped on species.

The actual calculations I'm doing are a bit more verbose than this, so I realize that there may be a simpler way of doing max/min without all the enquo stuff. But just for the general strategy, is there a succinct way to apply a function of this style to a subset of the columns in a dataframe?

Is there some alternative strategy to this that anyone might recommend?

my approach trades away some control over the constructed variable name , but is quite straightforward...

library(tidyverse)
get_max_min <- function(df,...) {
  
  features <- enquos(...)
  df %>% 
    group_by(Species) %>% 
    summarize_at(.vars = features,
                 .funs = list(~max(.),~min(.)))
  
}

iris %>% get_max_min(Sepal.Width,Sepal.Length)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

I think I have sorted out a solution that works for my peculiar needs at the moment. The important thing for me was understanding that I can feed a string into !! sym(...) within the function call in order to get the quosure function to work like I was expecting it to.

So then I could create a dataframe with a list of all the permutations I wanted to look at, populate a list of dataframes within that in a loop, and then reduce all those seperate dataframes into one combined frame:

library(tidyverse)

get_max_min <- function(df,feature) {
  
  feature <- enquo(feature)
  feature_name <- quo_name(feature)
  
  df %>% 
    group_by(Species) %>% 
    summarize(!! paste0("max_", feature_name) := max(!! feature),
              !! paste0("min_", feature_name) := min(!! feature))
  }

# Going to construct permutations of object and measure to input to get_max_min
object <- c("Sepal", "Petal")
measure <- c("Width", "Length")

dfs <- expand.grid(object = object,measure = measure)
dfs <- dfs %>% mutate(features = paste0(object,".",measure ))

for (i in seq_along(dfs$features)) {
  txt_feature <- dfs$features[[i]]
  dfs$Max_Min_Output[[i]] <- get_max_min(iris, !! sym(txt_feature))   # THIS IS THE THING GLUED IT TOGETHER
}

df_combined <- reduce(dfs$Max_Min_Output, full_join, by = "Species")

IRL I'm going to end up with ~16 features summarized in a bunch of different ways taking max and min and snapshot values when varying conditions are met and then aggregated into common buckets. The iris example probably doesn't convey how messy it would get, but I think this should work for me.

EDIT to say: as I look at this I wonder if I have done something incredibly convoluted and unnecessary by taking !! enquo(!! sym()). I started out trying to rewrite a bunch of code in this way as the requirements of the project keep expanding (and my code as been doubling each time). I also wanted to get a handle on quasiquotation. It's working now, but something about it feels not-quite-right. If anyone has any simplifying perspective on this it would be welcome. I'm not far into learning R and some of this stuff warps my brain.

Thank you for this approach. I didn't previously understand the power and flexibility of summarize_at and need to spend some time tinkering with this to see how I can get it implemented in my idiosyncratic data process. Will post an update at the end of the day...

Much appreciated!