iterate through rows of criteria to subset dataset and analyze

I am trying to figure out a way to subset a data set using certain criteria stored in rows and generate reports for each rows. Below is a minimal example where I calculate the average mpg for certain cyl and gear combination in the mtcars data.

library(tidyverse)

grps <- mtcars %>% 
  distinct(cyl, gear) 

mpg_summary <- function(sub_criteria){
  sub_criteria %>%
  left_join(mtcars) %>% 
  summarise(m_mpg = mean(mpg)) %>% 
  pull(m_mpg)
}

grps %>% 
  slice(1) %>% 
  mpg_summary()
#> Joining, by = c("cyl", "gear")
#> [1] 19.75

Created on 2019-10-02 by the reprex package (v0.3.0)

Here I used slice to select one row for the subsetting criteria. I figure that one of the map functions should allow me to generate a list of the mpg_summary for multiple rows, but could not find anything. Your advice is appreciated.

If you want to go down the purrr route, you can do something like this

library(tidyverse)

grps <- mtcars %>% 
    distinct(cyl, gear) 

mpg_summary <- function(sub_criteria){
    sub_criteria %>%
        left_join(mtcars) %>% 
        summarise(m_mpg = mean(mpg)) %>% 
        pull(m_mpg)
}

grps %>%
    group_nest(row_number()) %>% 
    pull(data) %>% 
    map_dbl(~mpg_summary(.x))
#> [1] 19.750 26.925 19.750 15.050 21.500 28.200 15.400 19.700

But if I was you I would go for a workflow like this

library(tidyverse)

grps <- mtcars %>% 
    distinct(cyl, gear) 

mtcars %>% 
    right_join(grps) %>% 
    group_by(cyl, gear) %>%
    summarise(m_mpg = mean(mpg)) %>% 
    pull(m_mpg)
#> Joining, by = c("cyl", "gear")
#> [1] 21.500 26.925 28.200 19.750 19.750 19.700 15.050 15.400
1 Like

Thanks! group_nest is nice, it generated the list of the data frame that I want.

I do not use your second approach since I have complex/expensive calculation in the summary/reporting function (mpg_summary here) and multiple dataframes are involved.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.