Use mutate and map function on a list of dataframes

I am trying to use the map function to do something complex - I'd like to use the values of the Result column per each dataframe I have in a list ( these are monthly dataframes and should be kept separated) and iterate this column with an external vector which should change according to a categorical variable inside the dataframe. Thus I defined two different functions to be passed inside map but I am getting an error. The ideal would be also to create a new column in each dataframe of the list to store the new values .. but I am not sure how to do that with "mutate" given that the object is a list.

Thanks a lot

rm(list = ls())

setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
#> Error: RStudio not running
getwd()
#> [1] "C:/Users/Angela/AppData/Local/Temp/RtmpYhghZ1/reprex-3a1458d04eec-pure-tayra"

#load required packages 
library(mc2d)
#> Loading required package: mvtnorm
#> 
#> Attaching package: 'mc2d'
#> The following objects are masked from 'package:base':
#> 
#>     pmax, pmin
library(gplots)
#> 
#> Attaching package: 'gplots'
#> The following object is masked from 'package:stats':
#> 
#>     lowess
library(RColorBrewer)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(reprex)
library(tidyverse)
set.seed(99)
iters<-1000

df<-data.frame(id=c(1:30),cat=c(rep("a",12),rep("b",18)),month=c(1:6,1,6,4,1,5,2,3,2,5,4,6,3:6,4:6,1:5,5),n=rpois(30,5))

df$n[df$n == "0"] <- 3
se<-rbeta(iters,96,6)
epi.a<-rpert(iters,min=1.5, mode=2, max=3)
p=0.2
p2=epi.a*p

df<-as_tibble(df)
# this defined function ensures any `n` from `df` will be itered with 10000 s and a and generated 10000 results
iter_n <- function(n) map2_dbl(.x = se, .y = p2, ~ 1 - (1 - .x * .y) ^ n)
list_1 <- df %>% mutate(Result = map(n, ~iter_n(.x))) %>% unnest(Result)%>% group_split(month)
list_1[[1]]
#> # A tibble: 4,000 x 5
#>       id cat   month     n Result
#>    <int> <chr> <dbl> <dbl>  <dbl>
#>  1     1 a         1     5  0.953
#>  2     1 a         1     5  0.927
#>  3     1 a         1     5  0.904
#>  4     1 a         1     5  0.945
#>  5     1 a         1     5  0.872
#>  6     1 a         1     5  0.840
#>  7     1 a         1     5  0.896
#>  8     1 a         1     5  0.944
#>  9     1 a         1     5  0.925
#> 10     1 a         1     5  0.937
#> # ... with 3,990 more rows

p3a=rbeta(iters,50,5)
p3b=rbeta(iters,40,6)

iter_n2a<-function(Result) map_dbl(p3a, ~ prod(1 - Result * .x))
iter_n2b<-function(Result) map_dbl(p3b, ~ prod(1 - Result * .x))

list_2 <- list_1%>% map( ~ mutate(n_p = if_else(.x$cat == "a",
                                    map(.x$Result,  ~ iter_n2a(.x)),
                                    map(.x$Result,  ~ iter_n2b(.x)))))
#> Error in UseMethod("mutate"): no applicable method for 'mutate' applied to an object of class "list"

Created on 2022-05-06 by the reprex package (v2.0.1)

It's a little tricky with the map function. if you're using structure like list %>% map(), remember that the .x object inside map syntax refers to the subordinate object of the list.
In the case above, you're going to iterate through a list which consists of data.frames. Here the .x in map, at the first layer, refer to each data.frame from list_1.
In addition, mutate directly calls the names from the dataframe which passed to its .data param, you don't have to write .x$... inside mutate().
So the coding would be:

list_2 <- list_1 %>% map(
  ~ .x %>% mutate(
    n_p = if_else(
      cat == "a", 
      map(Result, iter_n2a), # considering `iter_n2a` is a single-param function, we can directly quote its name in map
      map(Result, iter_n2b))
  ))

# or the clearer version:

list_2 <- map(
  list_1,
  function(df) df %>% mutate(n_p = if_else(cat == "a", map(Result, iter_n2a), map(Result, iter_n2b)))
)


list_2
[[1]]
# A tibble: 4,000 x 6
      id cat   month     n Result n_p          
   <int> <chr> <dbl> <dbl>  <dbl> <list>       
 1     1 a         1     8  0.968 <dbl [1,000]>
 2     1 a         1     8  0.976 <dbl [1,000]>
 3     1 a         1     8  0.969 <dbl [1,000]>
 4     1 a         1     8  0.967 <dbl [1,000]>
 5     1 a         1     8  0.961 <dbl [1,000]>
 6     1 a         1     8  0.983 <dbl [1,000]>
 7     1 a         1     8  0.977 <dbl [1,000]>
 8     1 a         1     8  0.961 <dbl [1,000]>
 9     1 a         1     8  0.953 <dbl [1,000]>
10     1 a         1     8  0.945 <dbl [1,000]>
# ... with 3,990 more rows

[[2]]
# A tibble: 4,000 x 6
      id cat   month     n Result n_p          
   <int> <chr> <dbl> <dbl>  <dbl> <list>       
 1     2 a         2     1  0.349 <dbl [1,000]>
 2     2 a         2     1  0.371 <dbl [1,000]>
 3     2 a         2     1  0.353 <dbl [1,000]>
 4     2 a         2     1  0.348 <dbl [1,000]>
 5     2 a         2     1  0.333 <dbl [1,000]>
 6     2 a         2     1  0.397 <dbl [1,000]>
 7     2 a         2     1  0.374 <dbl [1,000]>
 8     2 a         2     1  0.332 <dbl [1,000]>
 9     2 a         2     1  0.318 <dbl [1,000]>
10     2 a         2     1  0.304 <dbl [1,000]>
# ... with 3,990 more rows

[[3]]
# A tibble: 4,000 x 6
      id cat   month     n Result n_p          
   <int> <chr> <dbl> <dbl>  <dbl> <list>       
 1     3 a         3     6  0.924 <dbl [1,000]>
 2     3 a         3     6  0.938 <dbl [1,000]>
 3     3 a         3     6  0.926 <dbl [1,000]>
 4     3 a         3     6  0.923 <dbl [1,000]>
 5     3 a         3     6  0.912 <dbl [1,000]>
 6     3 a         3     6  0.952 <dbl [1,000]>
 7     3 a         3     6  0.940 <dbl [1,000]>
 8     3 a         3     6  0.912 <dbl [1,000]>
 9     3 a         3     6  0.900 <dbl [1,000]>
10     3 a         3     6  0.887 <dbl [1,000]>
# ... with 3,990 more rows

[[4]]
# A tibble: 6,000 x 6
      id cat   month     n Result n_p          
   <int> <chr> <dbl> <dbl>  <dbl> <list>       
 1     4 a         4     4  0.821 <dbl [1,000]>
 2     4 a         4     4  0.844 <dbl [1,000]>
 3     4 a         4     4  0.824 <dbl [1,000]>
 4     4 a         4     4  0.820 <dbl [1,000]>
 5     4 a         4     4  0.802 <dbl [1,000]>
 6     4 a         4     4  0.868 <dbl [1,000]>
 7     4 a         4     4  0.847 <dbl [1,000]>
 8     4 a         4     4  0.801 <dbl [1,000]>
 9     4 a         4     4  0.784 <dbl [1,000]>
10     4 a         4     4  0.766 <dbl [1,000]>
# ... with 5,990 more rows

[[5]]
# A tibble: 7,000 x 6
      id cat   month     n Result n_p          
   <int> <chr> <dbl> <dbl>  <dbl> <list>       
 1     5 a         5     2  0.576 <dbl [1,000]>
 2     5 a         5     2  0.605 <dbl [1,000]>
 3     5 a         5     2  0.581 <dbl [1,000]>
 4     5 a         5     2  0.575 <dbl [1,000]>
 5     5 a         5     2  0.556 <dbl [1,000]>
 6     5 a         5     2  0.637 <dbl [1,000]>
 7     5 a         5     2  0.609 <dbl [1,000]>
 8     5 a         5     2  0.554 <dbl [1,000]>
 9     5 a         5     2  0.535 <dbl [1,000]>
10     5 a         5     2  0.516 <dbl [1,000]>
# ... with 6,990 more rows

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.