List of undefined objects?

Lief · December 4, 2020, 9:42pm

I frequently find myself needing to create summary statistics for a dataset with many different grouping variables. I'm trying to figure out a good solution. My current approach works ok, but I'd like to be able to setup a list of variables to pass to map_dfr instead of having to manually write out the bind_rows. The problem is that as it is currently setup, the objects for the list are undefined in the global environment, so I don't know how to create a list of undefined objects to then loop over for the aggregation and summarizing.

I'd also appreciate any more elegant solutions to the general problem. The objective is to support an arbitrary number of group levels with arbitrary summarizing functions. In lay terms, I want to take a dataset, chop it into different groups, and calculate some value for each group. Then repeat that process many times with different grouping variables. The whole thing should then be compiled as a dataframe where I can easily select different groups or different group values to compare the summary statistic.

library(tidyverse)

# create example data, with some arbitrary grouping variables
iris <- iris %>% 
  mutate(longPetals = Petal.Length > mean(Petal.Length),
         widePetals = Petal.Width > mean(Petal.Width)) 

# Using count as a simple example, could be a lot of different things going on in here by group
fn_group_count <- function(group, data = iris) {
  
  group_c <- deparse(substitute(group))
  
  data %>% 
    count({{ group }}) %>% 
    rename(group_value = {{ group }}) %>% 
    mutate(group = group_c, 
           group_value = as.character(group_value)) %>% 
    select(group, everything())
  
}

# using bind_rows with manual coding of each variable - would like to specify a list of variables instead
group_counts <- bind_rows(
  fn_group_count(Species),
  fn_group_count(widePetals),
  fn_group_count(longPetals)
)

# something like this - but this doesn't work because `Species` is not defined
group_vars <- c(Species, widePetals, longPetals)
map_dfr(group_vars, fn_group_count)

Lief · December 5, 2020, 1:37am

I just re-read the programming with dplyr vignette and see that it recommends using the .data pronoun for my use case. I last read that article over the summer, and it is much clearer now - the embrasure system and .data is much easier to understand than enquo was.

library(tidyverse)

# create example data, with some arbitrary grouping variables
iris <- iris %>% 
  mutate(longPetals = Petal.Length > mean(Petal.Length),
         widePetals = Petal.Width > mean(Petal.Width)) 

# rewrite using .data[[var]]
fn_group_count <- function(group, data = iris) {
  
  data %>% 
    count(.data[[group]]) %>% 
    rename(group_value = group) %>% 
    mutate(group = group, 
           group_value = as.character(group_value)) %>% 
    select(group, everything())
  
}

# specify vars as character strings
group_vars <- c("Species", "widePetals", "longPetals")
map_dfr(group_vars, fn_group_count)

technocrat · December 5, 2020, 3:26am

Thanks for surfacing the pronoun!

Nit comment—I find data inconvenient as an object name because some operations give precedence to utils::data in the namespace and treat the name as a closure rather than the user object. Same with df. I've trained myself to use dat in preference to either.

Lief · December 5, 2020, 11:59pm

Retraining myself from using df and data would be a good thing to do, but at this point I think I'd miss seeing the old stand by error object of type 'closure' is not subsettable .

system · December 26, 2020, 11:59pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.