setting list element names based on argument to `pmap`

IndrajeetPatil · November 2, 2018, 2:18am

Hi,

This is my first question on the RStudio community, so forgive me if this is against the community decorum.

I am stuck in my analysis with a purrr-related issue. I had created a question on StackOverflow with a reprex, but haven't received any response (which is highly unusual). So I wanted to raise the same question here and see if anybody has any thoughts on this:

If linking to StackOverflow questions is not kosher, please feel free to delete this question.

Thanks in advance,
Indra

mfherman · November 2, 2018, 3:02am

Welcome @IndrajeetPatil! You can check out this FAQ post for the policy on cross-posting FAQ: Is it OK if I cross-post? Basically, it is not encouraged, but not outright disallowed. Given that you don't have any answers on SO and you link to the question, I think it is okay in this case. But I would actually post the text of the original question here as well as the link to SO so others don't have to go to SO to read the question.

As to your question, one solution I came up with is to feed a named list into pmap() so that the ouput list elements are named with those same names. Instead of manually naming the input list (because you said your real list may have large number of elements), I use llist() from Hmisc to create the named list from original object names.

From the llist() documentation:

llist is like list except that it preserves the names or labels of the component variables in the variables label attribute. This can be useful when looping over variables or using sapply or lapply . By using llist instead of list one can annotate the output with the current variable's name or label. llist also defines a names attribute for the list and pulls the names from the arguments' expressions for non-named arguments.

I think this achieves your goal without requiring doubling typing of list names:

# setup
library(tidyverse)
library(groupedstats)
set.seed(123)

# creating the dataframes
data_1 <- tibble::as.tibble(iris)
data_2 <- tibble::as.tibble(mtcars)
data_3 <- tibble::as.tibble(airquality)

# creating a list
purrr::pmap(
  .l = list(
    data = Hmisc::llist(data_1, data_2, data_3),
    grouping.vars = alist(Species, c(am, cyl), Month),
    measures = alist(c(Sepal.Length, Sepal.Width), wt, c(Ozone, Solar.R, Wind))
    ),
  .f = groupedstats::grouped_summary
  )
#> $data_1
#> # A tibble: 6 x 16
#>   Species type  variable missing complete     n  mean    sd   min   p25
#>   <fct>   <chr> <chr>      <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa  nume… Sepal.L…       0       50    50  5.01  0.35   4.3  4.8 
#> 2 setosa  nume… Sepal.W…       0       50    50  3.43  0.38   2.3  3.2 
#> 3 versic… nume… Sepal.L…       0       50    50  5.94  0.52   4.9  5.6 
#> 4 versic… nume… Sepal.W…       0       50    50  2.77  0.31   2    2.52
#> 5 virgin… nume… Sepal.L…       0       50    50  6.59  0.64   4.9  6.23
#> 6 virgin… nume… Sepal.W…       0       50    50  2.97  0.32   2.2  2.8 
#> # … with 6 more variables: median <dbl>, p75 <dbl>, max <dbl>,
#> #   std.error <dbl>, mean.low.conf <dbl>, mean.high.conf <dbl>
#> 
#> $data_2
#> # A tibble: 6 x 17
#>      am   cyl type  variable missing complete     n  mean    sd   min   p25
#>   <dbl> <dbl> <chr> <chr>      <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     6 nume… wt             0        3     3  2.75  0.13  2.62  2.7 
#> 2     1     4 nume… wt             0        8     8  2.04  0.41  1.51  1.78
#> 3     0     6 nume… wt             0        4     4  3.39  0.12  3.21  3.38
#> 4     0     8 nume… wt             0       12    12  4.1   0.77  3.44  3.56
#> 5     0     4 nume… wt             0        3     3  2.94  0.41  2.46  2.81
#> 6     1     8 nume… wt             0        2     2  3.37  0.28  3.17  3.27
#> # … with 6 more variables: median <dbl>, p75 <dbl>, max <dbl>,
#> #   std.error <dbl>, mean.low.conf <dbl>, mean.high.conf <dbl>
#> 
#> $data_3
#> # A tibble: 15 x 16
#>    Month type  variable missing complete     n   mean     sd   min    p25
#>    <int> <chr> <chr>      <dbl>    <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>
#>  1     5 inte… Ozone          5       26    31  23.6   22.2    1    11   
#>  2     5 inte… Solar.R        4       27    31 181.   115.     8    72   
#>  3     5 nume… Wind           0       31    31  11.6    3.53   5.7   8.9 
#>  4     6 inte… Ozone         21        9    30  29.4   18.2   12    20   
#>  5     6 inte… Solar.R        0       30    30 190.    92.9   31   127   
#>  6     6 nume… Wind           0       30    30  10.3    3.77   1.7   8   
#>  7     7 inte… Ozone          5       26    31  59.1   31.6    7    36.2 
#>  8     7 inte… Solar.R        0       31    31 216.    80.6    7   175   
#>  9     7 nume… Wind           0       31    31   8.94   3.04   4.1   6.9 
#> 10     8 inte… Ozone          5       26    31  60.0   39.7    9    28.8 
#> 11     8 inte… Solar.R        3       28    31 172.    76.8   24   107   
#> 12     8 nume… Wind           0       31    31   8.79   3.23   2.3   6.6 
#> 13     9 inte… Ozone          1       29    30  31.4   24.1    7    16   
#> 14     9 inte… Solar.R        0       30    30 167.    79.1   14   117.  
#> 15     9 nume… Wind           0       30    30  10.2    3.46   2.8   7.55
#> # … with 6 more variables: median <dbl>, p75 <dbl>, max <dbl>,
#> #   std.error <dbl>, mean.low.conf <dbl>, mean.high.conf <dbl>

^{Created on 2018-11-01 by the reprex package (v0.2.1)}

IndrajeetPatil · November 2, 2018, 3:23am

Ah, sorry, should have checked out the FAQ. I will keep in mind these instructions about cross-posting the next time I ask a question here.

Thanks for the answer; wasn't aware of llist function! This is pretty cool.

I will use this solution for now, but I would still like to know if there is a way to do the same purely within tidyverse packages. I am a bit reluctant to add another package to the dependencies only for this one function. Maybe someone else has an alternative solution within tidyverse?

mfherman · November 2, 2018, 3:30am

If you don't want the Hmisc dependency, you could use base mget() and achieve similar results. Only difference is that you have to pass a vector of quoted object names and it looks them up in the global environment and returns a named list. I'm not aware of any tidyverse specific approach to this, though there very well might be one!

# setup
library(tidyverse)
library(groupedstats)
set.seed(123)

# creating the dataframes
data_1 <- tibble::as.tibble(iris)
data_2 <- tibble::as.tibble(mtcars)
data_3 <- tibble::as.tibble(airquality)

# creating a list
purrr::pmap(
  .l = list(
    data = mget(c("data_1", "data_2", "data_3")),
    grouping.vars = alist(Species, c(am, cyl), Month),
    measures = alist(c(Sepal.Length, Sepal.Width), wt, c(Ozone, Solar.R, Wind))
  ),
  .f = groupedstats::grouped_summary
)
#> $data_1
#> # A tibble: 6 x 16
#>   Species type  variable missing complete     n  mean    sd   min   p25
#>   <fct>   <chr> <chr>      <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa  nume… Sepal.L…       0       50    50  5.01  0.35   4.3  4.8 
#> 2 setosa  nume… Sepal.W…       0       50    50  3.43  0.38   2.3  3.2 
#> 3 versic… nume… Sepal.L…       0       50    50  5.94  0.52   4.9  5.6 
#> 4 versic… nume… Sepal.W…       0       50    50  2.77  0.31   2    2.52
#> 5 virgin… nume… Sepal.L…       0       50    50  6.59  0.64   4.9  6.23
#> 6 virgin… nume… Sepal.W…       0       50    50  2.97  0.32   2.2  2.8 
#> # … with 6 more variables: median <dbl>, p75 <dbl>, max <dbl>,
#> #   std.error <dbl>, mean.low.conf <dbl>, mean.high.conf <dbl>
#> 
#> $data_2
#> # A tibble: 6 x 17
#>      am   cyl type  variable missing complete     n  mean    sd   min   p25
#>   <dbl> <dbl> <chr> <chr>      <dbl>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1     6 nume… wt             0        3     3  2.75  0.13  2.62  2.7 
#> 2     1     4 nume… wt             0        8     8  2.04  0.41  1.51  1.78
#> 3     0     6 nume… wt             0        4     4  3.39  0.12  3.21  3.38
#> 4     0     8 nume… wt             0       12    12  4.1   0.77  3.44  3.56
#> 5     0     4 nume… wt             0        3     3  2.94  0.41  2.46  2.81
#> 6     1     8 nume… wt             0        2     2  3.37  0.28  3.17  3.27
#> # … with 6 more variables: median <dbl>, p75 <dbl>, max <dbl>,
#> #   std.error <dbl>, mean.low.conf <dbl>, mean.high.conf <dbl>
#> 
#> $data_3
#> # A tibble: 15 x 16
#>    Month type  variable missing complete     n   mean     sd   min    p25
#>    <int> <chr> <chr>      <dbl>    <dbl> <dbl>  <dbl>  <dbl> <dbl>  <dbl>
#>  1     5 inte… Ozone          5       26    31  23.6   22.2    1    11   
#>  2     5 inte… Solar.R        4       27    31 181.   115.     8    72   
#>  3     5 nume… Wind           0       31    31  11.6    3.53   5.7   8.9 
#>  4     6 inte… Ozone         21        9    30  29.4   18.2   12    20   
#>  5     6 inte… Solar.R        0       30    30 190.    92.9   31   127   
#>  6     6 nume… Wind           0       30    30  10.3    3.77   1.7   8   
#>  7     7 inte… Ozone          5       26    31  59.1   31.6    7    36.2 
#>  8     7 inte… Solar.R        0       31    31 216.    80.6    7   175   
#>  9     7 nume… Wind           0       31    31   8.94   3.04   4.1   6.9 
#> 10     8 inte… Ozone          5       26    31  60.0   39.7    9    28.8 
#> 11     8 inte… Solar.R        3       28    31 172.    76.8   24   107   
#> 12     8 nume… Wind           0       31    31   8.79   3.23   2.3   6.6 
#> 13     9 inte… Ozone          1       29    30  31.4   24.1    7    16   
#> 14     9 inte… Solar.R        0       30    30 167.    79.1   14   117.  
#> 15     9 nume… Wind           0       30    30  10.2    3.46   2.8   7.55
#> # … with 6 more variables: median <dbl>, p75 <dbl>, max <dbl>,
#> #   std.error <dbl>, mean.low.conf <dbl>, mean.high.conf <dbl>

^{Created on 2018-11-01 by the reprex package (v0.2.1)}

cderv · November 2, 2018, 6:43am

You can also use lst function from tidyverse , for creating list. It is like tibble but for list. However, it automatically name the list.
It is in dplyr, exported from tibble

library(tidyverse)
library(groupedstats)
set.seed(123)

# creating the dataframes
data_1 <- tibble::as.tibble(iris)
data_2 <- tibble::as.tibble(mtcars)
data_3 <- tibble::as.tibble(airquality)

# creating a list
purrr::pmap(
  .l = list(
    data = lst(data_1, data_2, data_3),
    grouping.vars = alist(Species, c(am, cyl), Month),
    measures = alist(c(Sepal.Length, Sepal.Width), wt, c(Ozone, Solar.R, Wind))
  ),
  .f = groupedstats::grouped_summary
) %>%
  str(1)
#> List of 3
#>  $ data_1:Classes 'tbl_df', 'tbl' and 'data.frame':  6 obs. of  16 variables:
#>  $ data_2:Classes 'tbl_df', 'tbl' and 'data.frame':  6 obs. of  17 variables:
#>  $ data_3:Classes 'tbl_df', 'tbl' and 'data.frame':  15 obs. of  16 variables:

^{Created on 2018-11-02 by the reprex package (v0.2.1)}

Note: If you want to stay full tidyverse, using tidyeval, you can replace alist by rlang::exprs. However, it is not exported in tidyverse

mfherman · November 2, 2018, 10:01am

Aha! I kind of guessed there was a similar tidyverse function. Always love that there is so much more to learn!!

system · November 9, 2018, 10:01am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.