trouble with map2 and group_by

Trying to learn purrr::map functions, and having trouble with map2 and a function that includes group_by. This could be something elementary that I am doing wrong, but I am not clear on what.

Reprex below

#load libraries
library(rlang)
library(tidyverse)
library(fivethirtyeight)
library(purrr)

# make function
mean_grouped <- function(data, groupvar, meanvar) {
  data %>%
    group_by({{ groupvar }}) %>%
    summarise(mean = mean({{ meanvar }}, na.rm = TRUE))
}

# test function
mean_grouped(starwars, homeworld, height)
#> # A tibble: 49 x 2
#>    homeworld       mean
#>    <chr>          <dbl>
#>  1 Alderaan        176.
#>  2 Aleen Minor      79 
#>  3 Bespin          175 
#>  4 Bestine IV      180 
#>  5 Cato Neimoidia  191 
#>  6 Cerea           198 
#>  7 Champala        196 
#>  8 Chandrila       150 
#>  9 Concord Dawn    183 
#> 10 Corellia        175 
#> # … with 39 more rows

# try to purrr::map function over possible options for grouping var
# and for mean var
# first create df of all combinations
group_vars = c("gender", "homeworld", "species",
               "hair_color", "eye_color", "skin_color")
mean_vars = c("height", "mass", "birth_year")
vars_list <- list(x = group_vars, y = mean_vars)
cross_list <- cross_df(vars_list)

# Now map attempt
starwars %>%
  map2_dfr(.x = cross_list$x,
           .y = cross_list$y,
           .f = mean_grouped)
#> Error in UseMethod("group_by_"): no applicable method for 'group_by_' applied to an object of class "character"

Created on 2019-06-30 by the reprex package (v0.3.0)

I think there are a couple reasons your map2_dfr() is giving an error. The first is that unlike many other tidyverse functions, the map family of functions generally don't take dataframes as their inputs, which is what happens when you pipe starwars as the first argument of map2_dfr().

In the case of map2(), it expects .x and .y to be vectors, and you are correctly specifying them here as your grouping and mean vars. But then you want to iterate over those vectors and use them as the arguments in your mean_grouped() function, applied to the starwars data frame.

A second issue is discussed in the last example here: https://www.tidyverse.org/articles/2019/06/rlang-0-4-0/. Because you are passing string vectors to your custom function (by way of map2()), you should refer to them using .data[[var]] rather than {{var}}.

I'm not sure exactly what the final output you're hoping for is, but below is an example that might be what you're after.

library(tidyverse)

mean_grouped <- function(data, groupvar, meanvar) {
  data %>%
    group_by(.data[[groupvar]]) %>%
    summarise(mean = mean(.data[[meanvar]], na.rm = TRUE))
}

group_vars <- c(
  "gender", "homeworld", "species",
  "hair_color", "eye_color", "skin_color"
)

mean_vars <- c("height", "mass", "birth_year")
vars_list <- list(x = group_vars, y = mean_vars)
cross_list <- cross_df(vars_list)

map2_dfr(
  .x = cross_list$x,
  .y = cross_list$y,
  .f = ~ mean_grouped(starwars, .x, .y)
)
#> # A tibble: 453 x 7
#>    gender         mean homeworld   species hair_color eye_color skin_color
#>    <chr>         <dbl> <chr>       <chr>   <chr>      <chr>     <chr>     
#>  1 <NA>           120  <NA>        <NA>    <NA>       <NA>      <NA>      
#>  2 female         165. <NA>        <NA>    <NA>       <NA>      <NA>      
#>  3 hermaphrodite  175  <NA>        <NA>    <NA>       <NA>      <NA>      
#>  4 male           179. <NA>        <NA>    <NA>       <NA>      <NA>      
#>  5 none           200  <NA>        <NA>    <NA>       <NA>      <NA>      
#>  6 <NA>           139. <NA>        <NA>    <NA>       <NA>      <NA>      
#>  7 <NA>           176. Alderaan    <NA>    <NA>       <NA>      <NA>      
#>  8 <NA>            79  Aleen Minor <NA>    <NA>       <NA>      <NA>      
#>  9 <NA>           175  Bespin      <NA>    <NA>       <NA>      <NA>      
#> 10 <NA>           180  Bestine IV  <NA>    <NA>       <NA>      <NA>      
#> # … with 443 more rows

Created on 2019-06-30 by the reprex package (v0.3.0)

7 Likes

As an FYI, there is an RStudio interactive tutorial on purrr via the swirl course "Advanced R Programming". The module is "Functional Programming with purrr.

RStudio - swirl

The entire collection of swirl courses is very useful.

Thanks! Ideally I would then gather the groupvars, and gather the meanvars, so that I end up with 3 columns of groupvar, meanvar, and value.

I end up a little disappointed that I can not pass string vectors to a function with curly curly. Is this fixable? Part of the appeal of the tidyverse is that is mostly "just works", without a lot of funky exceptions.

More like Esperanto than English. :slight_smile:

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.