Dynamically constructing function calls using dplyr with NSE

lalush · March 29, 2020, 7:23pm

I want to be able to construct function calls dynamically with varying grouping variables/arguments using dplyr. The number of function calls may be quite large, which means the examples in the programming with dplyr vignette are not practical. Ideally I want to be able to construct an object (e.g. a list) beforehand which stores the arguments/variables to be passed in each function call.

I've written up a detailed explanation of my question on stackoverflow. I wanted to post it here too in order to draw some attention to the question.

What I want to do essentially is call a function multiple times, where each time a new list or vector of variable names are passed to the function. Inside the function there should be some rlang black magic making group_by() accept these variables as grouping variables.

dromano · March 29, 2020, 7:33pm

I read your SO post, but found it pretty dense -- would you be able to boil it down to something like: "Here's example of what I'd like to start with, and this is the result I'd like to achieve."? It sounds like an interesting challenge, but is difficult to follow.

lalush · March 29, 2020, 8:07pm

The problem boils down to this: in the "programming with dplyr" vignette, all the examples of functions where we can pass an optional number of grouping variables to some function are designed using optional arguments .... In our function calls we then explicitly need to write out the grouping variables:

f(df, grouping_var1, grouping_var2, grouping_var3)
f(df, grouping_var1)

How can I -- instead of explicitly writing out the variables in the function call -- pass a list (or vector) of these variables and get the desired output? E.g.

var_list <- c("grouping_var1", "grouping_var2", "grouping_var3")
f(df, var_list)

I found one potential way of solving this after I wrote the question, and I can paste that below in order to provide a better intuition on what I want:

set.seed(1)
df <- data.frame(values = sample(x = 1:10, size = 10),
                 grouping_var1 = sample(x = letters[1:2], size = 10, replace = TRUE),
                 grouping_var2 = sample(x = letters[24:26], size = 10, replace = TRUE),
                 grouping_var3 = sample(x = LETTERS[1:2], size = 10, replace = TRUE))

f <- function(df, group_var){
  # Allows us to pass dplyr variables as strings in a list
  my_group_vars <- syms(group_var$group_var) 

  df %>%
    group_by(!!! my_group_vars) %>%
    summarise_at(.vars = "values", .funs = sum) %>%
    mutate(group_ids = group_var$group_ids)
}

params_list <- list(
  list(group_var = c("grouping_var1"), group_ids = "var_1"),
  list(group_var = c("grouping_var1", "grouping_var2"), group_ids = "var_1_2"),
  list(group_var = c("grouping_var1", "grouping_var3"), group_ids = "var_1_3")
  )

Output:

lapply(params_list, f, df = df)

[[1]]
# A tibble: 2 x 3
  grouping_var1 values group_ids
  <fct>          <int> <chr>    
1 a                 31 var_1    
2 b                 24 var_1    

[[2]]
# A tibble: 5 x 4
# Groups:   grouping_var1 [2]
  grouping_var1 grouping_var2 values group_ids
  <fct>         <fct>          <int> <chr>    
1 a             x                 19 var_1_2  
2 a             y                  8 var_1_2  
3 a             z                  4 var_1_2  
4 b             x                 21 var_1_2  
5 b             y                  3 var_1_2  

[[3]]
# A tibble: 4 x 4
# Groups:   grouping_var1 [2]
  grouping_var1 grouping_var3 values group_ids
  <fct>         <fct>          <int> <chr>    
1 a             A                  9 var_1_3  
2 a             B                 22 var_1_3  
3 b             A                 15 var_1_3  
4 b             B                  9 var_1_3

I'd be grateful for more examples of how this can be done.

dromano · March 29, 2020, 8:51pm

Thanks, @lalush -- that's much clearer. It looks like you've taken care of the essence of the problem; I'm not sure I can come up with improvements or alternatives, but will try.

joels · March 30, 2020, 12:10am

I'm not sure what the "standard" tidyverse approach is here, as I never really have a sense of whether I'm "doing it right" when I try to write generalized tidyverse functions for my typical workflows, but here's another approach.

First, we can generate a list of combinations of grouping columns, rather than hard-coding them. In this case, the list includes all possible combinations of 1, 2, or 3 grouping columns, but that can be pared back as needed.

library(tidyverse)

# Generate a list of combinations of grouping variables.
groups.list = map(1:3, ~combn(names(df)[map_lgl(df, ~!is.numeric(.))], .x, simplify=FALSE)) %>% 
  flatten

Below is a summary function that uses group_by_at, which can take strings, so there's no need for non-standard evaluation. In addition, we get the group.ids values from group_vars itself, so we don't need a separate parameter or argument (though this may need to be tweaked, depending on what you expect for the names of the grouping columns).

# Summarise for each combination of groups
# Generate group.ids from group_vars itself
f2 <- function(data, group_vars) {

  data %>%
    group_by_at(group_vars) %>%
    summarise(values=sum(values)) %>% 
    mutate(group.ids=paste0("var_", paste(str_extract(group_vars, "[0-9]"), collapse="_")))

  }

Now we can run the run the function on every element of groups.list

map(groups.list, ~f2(df, .x))

Click to see output

[[1]]
# A tibble: 2 x 3
  grouping_var1 values group.ids
  <fct>          <int> <chr>    
1 a                 31 var_1    
2 b                 24 var_1    

[[2]]
# A tibble: 3 x 3
  grouping_var2 values group.ids
  <fct>          <int> <chr>    
1 x                 40 var_2    
2 y                 11 var_2    
3 z                  4 var_2    

[[3]]
# A tibble: 2 x 3
  grouping_var3 values group.ids
  <fct>          <int> <chr>    
1 A                 24 var_3    
2 B                 31 var_3    

[[4]]
# A tibble: 5 x 4
# Groups:   grouping_var1 [2]
  grouping_var1 grouping_var2 values group.ids
  <fct>         <fct>          <int> <chr>    
1 a             x                 19 var_1_2  
2 a             y                  8 var_1_2  
3 a             z                  4 var_1_2  
4 b             x                 21 var_1_2  
5 b             y                  3 var_1_2  

[[5]]
# A tibble: 4 x 4
# Groups:   grouping_var1 [2]
  grouping_var1 grouping_var3 values group.ids
  <fct>         <fct>          <int> <chr>    
1 a             A                  9 var_1_3  
2 a             B                 22 var_1_3  
3 b             A                 15 var_1_3  
4 b             B                  9 var_1_3  

[[6]]
# A tibble: 4 x 4
# Groups:   grouping_var2 [3]
  grouping_var2 grouping_var3 values group.ids
  <fct>         <fct>          <int> <chr>    
1 x             A                 24 var_2_3  
2 x             B                 16 var_2_3  
3 y             B                 11 var_2_3  
4 z             B                  4 var_2_3  

[[7]]
# A tibble: 7 x 5
# Groups:   grouping_var1, grouping_var2 [5]
  grouping_var1 grouping_var2 grouping_var3 values group.ids
  <fct>         <fct>         <fct>          <int> <chr>    
1 a             x             A                  9 var_1_2_3
2 a             x             B                 10 var_1_2_3
3 a             y             B                  8 var_1_2_3
4 a             z             B                  4 var_1_2_3
5 b             x             A                 15 var_1_2_3
6 b             x             B                  6 var_1_2_3
7 b             y             B                  3 var_1_2_3

Or, if you want to combine all of the results, you could do something like this:

map(groups.list, ~f2(df, .x)) %>% 
  bind_rows() %>% 
  mutate_if(is.factor, fct_explicit_na, na_level="All") %>% 
  select(group.ids, matches("grouping"), values)

Click to see output

   group.ids grouping_var1 grouping_var2 grouping_var3 values
   <chr>     <fct>         <fct>         <fct>          <int>
 1 var_1     a             All           All               31
 2 var_1     b             All           All               24
 3 var_2     All           x             All               40
 4 var_2     All           y             All               11
 5 var_2     All           z             All                4
 6 var_3     All           All           A                 24
 7 var_3     All           All           B                 31
 8 var_1_2   a             x             All               19
 9 var_1_2   a             y             All                8
10 var_1_2   a             z             All                4
11 var_1_2   b             x             All               21
12 var_1_2   b             y             All                3
13 var_1_3   a             All           A                  9
14 var_1_3   a             All           B                 22
15 var_1_3   b             All           A                 15
16 var_1_3   b             All           B                  9
17 var_2_3   All           x             A                 24
18 var_2_3   All           x             B                 16
19 var_2_3   All           y             B                 11
20 var_2_3   All           z             B                  4
21 var_1_2_3 a             x             A                  9
22 var_1_2_3 a             x             B                 10
23 var_1_2_3 a             y             B                  8
24 var_1_2_3 a             z             B                  4
25 var_1_2_3 b             x             A                 15
26 var_1_2_3 b             x             B                  6
27 var_1_2_3 b             y             B                  3

system · April 20, 2020, 12:10am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.