How to pass variable names inside a tibble to a function

caayala · February 4, 2020, 5:55pm

I have a tibble with two string columns that contain names of variables of a data.frame (in this example, iris).

I want to pass that names to a function so that I get a list-column with the results. I don't know how to pass the names to the function mean_group.

suppressMessages(library(dplyr))

df_var <- tibble(group_var = c("Species", "Species"),
                 var = c("Sepal.Length", "Sepal.Width"))

mean_group <- function(data, group_var, mean_var) {
  data %>%
    group_by({{ group_var }}) %>%
    summarise(mean = mean({{ mean_var }}))
}

# Using dplyr, naive approach
df_var %>% 
  mutate(tab = list(mean_group(iris, group_var, var)))
#> Error: Column `group_var` is unknown

# Using purrr
df_var %>% 
  mutate(tab = purrr::map2(group_var, var, ~mean_group(iris, .x, .y)))
#> Error: Column `.x` is unknown

# Expected result:

# This is what I expect to get inside a list-column.
t1 <- mean_group(iris, Species, Sepal.Length)
t2 <- mean_group(iris, Species, Sepal.Width)

df_var <- df_var %>% 
  mutate(tab = list(t1, t2))

df_var
#> # A tibble: 2 x 3
#>   group_var var          tab             
#>   <chr>     <chr>        <list>          
#> 1 Species   Sepal.Length <tibble [3 × 2]>
#> 2 Species   Sepal.Width  <tibble [3 × 2]>

# After that, I'll do this:
df_var %>% 
  tidyr::unnest_longer(tab)
#> # A tibble: 6 x 3
#>   group_var var          tab$Species $mean
#>   <chr>     <chr>        <fct>       <dbl>
#> 1 Species   Sepal.Length setosa       5.01
#> 2 Species   Sepal.Length versicolor   5.94
#> 3 Species   Sepal.Length virginica    6.59
#> 4 Species   Sepal.Width  setosa       3.43
#> 5 Species   Sepal.Width  versicolor   2.77
#> 6 Species   Sepal.Width  virginica    2.97

^{Created on 2020-02-04 by the reprex package (v0.3.0)}

technocrat · February 4, 2020, 6:00pm

Could you elaborate what you are looking to do beyond

suppressPackageStartupMessages(library(dplyr)) 
iris %>% group_by(Species) %>% summarize(mean(Sepal.Length))
#> # A tibble: 3 x 2
#>   Species    `mean(Sepal.Length)`
#>   <fct>                     <dbl>
#> 1 setosa                     5.01
#> 2 versicolor                 5.94
#> 3 virginica                  6.59

^{Created on 2020-02-04 by the reprex package (v0.3.0)}

caayala · February 4, 2020, 6:12pm

Hi. I extended the reprex to show what I'm trying to get in this example.

Thanks!

technocrat · February 5, 2020, 1:47am

I think this is close to what you're looking for

suppressPackageStartupMessages(library(dplyr)) 
suppressPackageStartupMessages(library(purrr)) 
iris %>%
  group_by(Species) %>%
  group_modify(~ {
    .x %>%
      map_dfc(sum) %>%
      mutate(nms = "mean")
  })
#> # A tibble: 3 x 6
#> # Groups:   Species [3]
#>   Species    Sepal.Length Sepal.Width Petal.Length Petal.Width nms  
#>   <fct>             <dbl>       <dbl>        <dbl>       <dbl> <chr>
#> 1 setosa             250.        171.         73.1        12.3 mean 
#> 2 versicolor         297.        138.        213          66.3 mean 
#> 3 virginica          329.        149.        278.        101.  mean

^{Created on 2020-02-04 by the reprex package (v0.3.0)}

cderv · February 5, 2020, 6:48am

I think you don't need purrr and also you don't need tidyeval with {{ }} operator for what you want to do.

Your input df_var contains the argument for the function in characters. That mean you can use them as is with the *_at variant. By modifying mean_group you would get what you want.
You'll also need a group_by_all() to have a result by line of you tibble to reproduce the example you provide with t1 and t2.

here is what I got - with also a variant with purrr

suppressMessages(library(dplyr))

df_var <- tibble(group_var = c("Species", "Species"),
                 var = c("Sepal.Length", "Sepal.Width"))

# Argument will be character string not quosure.
# using the *_at variant
mean_group <- function(data, group_var, mean_var) {
  data %>%
    group_by_at(group_var) %>%
    summarise_at(mean_var, list(mean = ~ mean(.x)))
}

# Using dplyr, naive approach
df_var %>% 
  group_by_all() %>%
  mutate(tab = list(mean_group(iris, group_var, var))) %>%
  tidyr::unnest_longer(tab)
#> # A tibble: 6 x 3
#> # Groups:   group_var, var [2]
#>   group_var var          tab$Species $mean
#>   <chr>     <chr>        <fct>       <dbl>
#> 1 Species   Sepal.Length setosa       5.01
#> 2 Species   Sepal.Length versicolor   5.94
#> 3 Species   Sepal.Length virginica    6.59
#> 4 Species   Sepal.Width  setosa       3.43
#> 5 Species   Sepal.Width  versicolor   2.77
#> 6 Species   Sepal.Width  virginica    2.97

# using purrr to get the same
df_var %>%
  mutate(tab = purrr::pmap(., ~ mean_group(iris, .x, .y))) %>%
  tidyr::unnest_longer(tab)
#> # A tibble: 6 x 3
#>   group_var var          tab$Species $mean
#>   <chr>     <chr>        <fct>       <dbl>
#> 1 Species   Sepal.Length setosa       5.01
#> 2 Species   Sepal.Length versicolor   5.94
#> 3 Species   Sepal.Length virginica    6.59
#> 4 Species   Sepal.Width  setosa       3.43
#> 5 Species   Sepal.Width  versicolor   2.77
#> 6 Species   Sepal.Width  virginica    2.97

^{Created on 2020-02-05 by the reprex package (v0.3.0.9001)}

Hope it helps

caayala · February 5, 2020, 2:03pm

Hi, cderv. In the situation I'm trying to solve, I can't modify the function mean_group to make it character aware. As in this example, the function I'm using receive quosures.

Do you know how can I solve this? How can I wrap

Thanks!

caayala · February 17, 2020, 8:59pm

To partially solve my problem I used the rlang::sym () function.

I don't understand why it doesn't work when I try to use the same code inside a mutate statement.

suppressMessages(library(dplyr))
library(rlang)

df_var <- tibble(group = c("Species", "Species"),
                 var = c("Sepal.Length", "Sepal.Width"))

mean_group <- function(data, group_var, mean_var) {
  data %>%
    group_by({{ group_var }}) %>%
    summarise(mean = mean({{ mean_var }}))
}

# Works in steps.
l_result <- purrr::map2(df_var$group, df_var$var, ~ mean_group(iris, !!rlang::sym(.x), !!rlang::sym(.y)))

df_result <- df_var %>% 
  mutate(l_result = l_result)

# Expected result
df_result %>% 
  tidyr::unnest(l_result)
#> # A tibble: 6 x 4
#>   group   var          Species     mean
#>   <chr>   <chr>        <fct>      <dbl>
#> 1 Species Sepal.Length setosa      5.01
#> 2 Species Sepal.Length versicolor  5.94
#> 3 Species Sepal.Length virginica   6.59
#> 4 Species Sepal.Width  setosa      3.43
#> 5 Species Sepal.Width  versicolor  2.77
#> 6 Species Sepal.Width  virginica   2.97

# Why does this code not work?
df_var %>% 
  mutate(l_result = purrr::map2(group, var, ~ mean_group(iris, !!rlang::sym(.x), !!rlang::sym(.y)))) 
#> Error in is_symbol(x): object '.x' not found

^{Created on 2020-02-17 by the reprex package (v0.3.0)}

lionel · February 24, 2020, 5:51pm

It doesn't work because the unquoting happens too early. I plan to explore ways of fixing this in the next major release of rlang:

system · March 16, 2020, 5:51pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.