Questions about purrr

Andrzej · May 9, 2021, 6:15am

Hi,
I have got questions about purrr:

Why sometimes curly braces are used in purrr::map or lapply ? What does it do ?
Why functions used in purrr::map as .f argument are used without parenthesis:

plus <- function(x, y) x + y

x <- c(0, 0, 0, 0)
map_dbl(x, plus, runif(1))

I know that little up there plus was defined as funtion but in here:

map_dbl(x, plus, runif(1))

it just looks like normal word and is not easy recognizable as function straightaway (especially with more complicated code) ? Using it with parenthesis gives an error: "Error in plus() : argument "x" is missing, with no default".

map_dbl(x, plus(), runif(1))

How to use correctly dplyr::mutate inside purrr::map and get to deeper nested elements of list-columns ?
Thank you for your help.

mishabalyasin · May 9, 2021, 8:47am

You have quite a few adjacent, but not exactly similar questions :). I'll try to answer them all one-by-one, but don't worry if you don't get all of it at once, it takes a bit of practice to become familiar with this type of syntax:

In R you can define functions on the fly with function(args) {}. Curly braces when used in this way allow you to write multi-line functions "normally":

# if you wanted to do something like this:
multi_line_function <- function(x)
  line_1
  line_2

# then you can do it like this:
multi_line_function <- function(x) line_1; line_2

# or (which is better for readability)
multi_line_function <- function(x) {
  line_1
  line_2
}

This exact logic continues to lapply or purrr (or, actually, anywhere where you can pass a function in -- which is quite a few places since R and especially tidyverse are functional first!).
So, that means you can do something like:

purrr::map(list_of_things, function(x) {
  line_1 <- x
  line_2 <- x*2
  line_2
}

Then for your second question you need to again remember that in R functions are "first-class citizens" meaning that you can pass them around just like you would do with "normal" values (like strings, integers, doubles whatever). This is what you see the map_dbl(x, plus, runif(1)) variant.

What it essentially saying is something like this:

a. take each element of x, e.g., x[1]
b. run plus(x[1], runif(1)) on it
c. return result (in this case by ensuring that result is a double)

The reason why plus() doesn't work is a bit more complicated to explain with simple analogies, but the core problem is that, like I've mentioned couple of times by now, R is functional, so your functions can return functions themselves, so when you are writing plus() in a map, what you are actually trying to do is something like this:

plus <- function(){
  function(x, y){
    x + y
  }
}

purrr::map_dbl(1:10, plus(), runif(1))
#>  [1]  1.651218  2.651218  3.651218  4.651218  5.651218  6.651218  7.651218
#>  [8]  8.651218  9.651218 10.651218

^{Created on 2021-05-09 by the reprex package (v2.0.0)}
See how I have another function that hides inside of first plus()? There are a lot of reasons why you would want to do that, but in this case it really just to demonstrate the principle.

Here I'm pretty sure you have a specific case in mind, so if you can make a reprex, it would help answer the question you have instead of the question I think you have.

Andrzej · May 9, 2021, 9:56am

Thank you very much @mishabalyasin,

As for my 3-rd question:
Yes, you are right (I have got this on my mind) I would like to extract p.values from an object called marg_combos according to my another post:
https://forum.posit.co/t/getting-all-possible-combinations-for-2x2-tables-with-fixed-margins-and-totals/103480/5

The topic was solved and finally I found a solution but I regarded this as workoround using intermediate steps and cbind() function, excel, etc. I was wondering if another more elegant way exists and if it could be done in purrr on one go, having the desired result as this below or just three columns: tab_id, mat, P_values_extracted:

This is why I am learning purrr now.
What I tried so far (taken from my previous post):

library(tidyverse)

## Constraints
r1_marg <- 20
r2_marg <- 20
c1_marg <- 29
c2_marg <- 11

## Range of values
r1c1 <- c(0:max(c(r1_marg, c1_marg)))
r1c2 <- c(0:max(c(r1_marg, c2_marg)))
r2c1 <- c(0:max(c(c1_marg, r2_marg)))
r2c2 <- c(0:max(c(c2_marg, r2_marg)))


marg_combos <- expand_grid(r1c1, r1c2, 
                           r2c1, r2c2) %>%
 filter(r1c1 + r1c2 == r1_marg & 
         r2c1 + r2c2 == r2_marg &
         r1c1 + r2c1 == c1_marg &
         r1c2 + r2c2 == c2_marg) %>% 
 tibble::rowid_to_column(var = "tab_id") %>% 
 pivot_longer(r1c1:r2c2, names_to = "pos") %>%
 group_by(tab_id) %>% 
 summarize(mat = list(matrix(value, 
                             nrow = 2, ncol = 2, byrow = TRUE))) %>% 
 group_by(tab_id) %>% 
 mutate(fisher = map(mat, fisher.test)) 

marg_combos
#> # A tibble: 12 x 3
#> # Groups:   tab_id [12]
#>    tab_id mat               fisher 
#>     <int> <list>            <list> 
#>  1      1 <int[,2] [2 x 2]> <htest>
#>  2      2 <int[,2] [2 x 2]> <htest>
#>  3      3 <int[,2] [2 x 2]> <htest>
#>  4      4 <int[,2] [2 x 2]> <htest>
#>  5      5 <int[,2] [2 x 2]> <htest>
#>  6      6 <int[,2] [2 x 2]> <htest>
#>  7      7 <int[,2] [2 x 2]> <htest>
#>  8      8 <int[,2] [2 x 2]> <htest>
#>  9      9 <int[,2] [2 x 2]> <htest>
#> 10     10 <int[,2] [2 x 2]> <htest>
#> 11     11 <int[,2] [2 x 2]> <htest>
#> 12     12 <int[,2] [2 x 2]> <htest>

^{Created on 2021-05-09 by the reprex package (v2.0.0)}

Thanks and credits to kjhealy.

My previous attempts:

marg_combos %>% mutate(p_value=map(.$fisher,~broom::tidy(.x))) %>% unnest(p.value)

which gives:

mutate(stats = map(marg_combos$fisher, ~broom::glance(.x)))

marg_combos %>%
    split(.$mat) %>%
    map(pluck, "p.value")

on which my laptop hangs all the time

marg_combos %>% bind_rows(marg_combos) %>%   
    mutate_if(is.list, simplify_all) %>%   
    unnest()

obraz

unlist(lapply(marg_combos, function(i) i[[3]][[p.value]]))

and other errors telling me that input should has got names and that $ is not good for atomic vectors.

I would like to kindly ask you to comment about these errors, so it would be very educational for me in order to understand.
Thank you very much indeed,

mishabalyasin · May 9, 2021, 3:27pm

You were really close with your first attempt:

library(tidyverse)

## Constraints
r1_marg <- 20
r2_marg <- 20
c1_marg <- 29
c2_marg <- 11

## Range of values
r1c1 <- c(0:max(c(r1_marg, c1_marg)))
r1c2 <- c(0:max(c(r1_marg, c2_marg)))
r2c1 <- c(0:max(c(c1_marg, r2_marg)))
r2c2 <- c(0:max(c(c2_marg, r2_marg)))


marg_combos <- expand_grid(r1c1, r1c2, 
                           r2c1, r2c2) %>%
  filter(r1c1 + r1c2 == r1_marg & 
           r2c1 + r2c2 == r2_marg &
           r1c1 + r2c1 == c1_marg &
           r1c2 + r2c2 == c2_marg) %>% 
  tibble::rowid_to_column(var = "tab_id") %>% 
  pivot_longer(r1c1:r2c2, names_to = "pos") %>%
  group_by(tab_id) %>% 
  summarize(mat = list(matrix(value, 
                              nrow = 2, ncol = 2, byrow = TRUE))) %>% 
  group_by(tab_id) %>% 
  mutate(fisher = map(mat, fisher.test)) 

marg_combos %>% 
  mutate(p_value = purrr::map_dbl(fisher, ~broom::tidy(.x)$p.value))
#> # A tibble: 12 x 4
#> # Groups:   tab_id [12]
#>    tab_id mat               fisher   p_value
#>     <int> <list>            <list>     <dbl>
#>  1      1 <int[,2] [2 × 2]> <htest> 0.000145
#>  2      2 <int[,2] [2 × 2]> <htest> 0.00334 
#>  3      3 <int[,2] [2 × 2]> <htest> 0.0310  
#>  4      4 <int[,2] [2 × 2]> <htest> 0.155   
#>  5      5 <int[,2] [2 × 2]> <htest> 0.480   
#>  6      6 <int[,2] [2 × 2]> <htest> 1       
#>  7      7 <int[,2] [2 × 2]> <htest> 1       
#>  8      8 <int[,2] [2 × 2]> <htest> 0.480   
#>  9      9 <int[,2] [2 × 2]> <htest> 0.155   
#> 10     10 <int[,2] [2 × 2]> <htest> 0.0310  
#> 11     11 <int[,2] [2 × 2]> <htest> 0.00334 
#> 12     12 <int[,2] [2 × 2]> <htest> 0.000145

^{Created on 2021-05-09 by the reprex package (v2.0.0)}

How I usually do it is I take first element (in your case marg_combos$fisher[[1]]) and then try to find a function that would give me what I want (in your case broom::tidy(marg_combos$fisher[[1]]$p.value)) and then use purrr::map with substituting concrete value with .x.

BTW, I didn't use broom that much so there might be more elegant ways to extract p.value rather than $ directly.

Andrzej · May 9, 2021, 6:18pm

Thank you @mishabalyasin,

This is what I wanted.
As for my question nr 1 from above about curly braces I found an example in Jenny's tutorial:
https://jennybc.github.io/purrr-tutorial/ls01_map-name-position-shortcuts.html
It reveals as follows:

library(tidyverse)
library(repurrrsive)
library(tibble)


got_chars_1 <- got_chars %>% {
  tibble(
       name = map_chr(., "name"),
    culture = map_chr(., "culture"),
     gender = map_chr(., "gender"),       
         id = map_int(., "id"),
       born = map_chr(., "born"),
      alive = map_lgl(., "alive")
  )
}


# and without curly braces outside tibble()

got_chars_2 <- got_chars %>% 
  tibble(
       name = map_chr(., "name"),
    culture = map_chr(., "culture"),
     gender = map_chr(., "gender"),       
         id = map_int(., "id"),
       born = map_chr(., "born"),
      alive = map_lgl(., "alive")
  )

^{Created on 2021-05-09 by the reprex package (v2.0.0)}

So mainly, the difference lies in presence or absence of the first column. What is the benefit of it ?

obraz

mishabalyasin · May 10, 2021, 7:29am

Two results are different. Notice how in your second example you get a new column that is named .. You then use this dot in all other calls and thus you are getting kinda the right answer, but I would argue that you get it by accident, not on purpose.

In the first example curly braces help you deal with that since the dot parameter is passed in without creating a column.

Keep in mind that pipe (%>%) is a shorthand for something like this:

x %>% fun(y)

# is equivalent to
fun(x, y)

Pay attention to where x ends up - as a first argument of the fun. That's what Jenny is saying when she says that curly braces "... prevent got_chars from being passed as a first argument". Without them you are basically saying:

tibble(
  got_chars, 
  name = ...,
  ...
)

and that's why you are getting all of got_chars as a first column in your second case.

Andrzej · May 10, 2021, 5:26pm

Thank you very much indeed @mishabalyasin, that was really helpful and educational.
In the meantime I found another solution (by trial and many errors attempts) without broom.
To me a bit counter intuitive as somehow it worked with pmap() which takes many arguments, I gave it only one !

Here you are:

marg_combos_all_in_one_go <- marg_combos %>% group_by(tab_id) %>% mutate(p_Values_super_extracted = pmap(list(fisher), pluck("p.value")))

and full reprex:

library(tidyverse)

## Constraints
r1_marg <- 20
r2_marg <- 20
c1_marg <- 29
c2_marg <- 11

## Range of values
r1c1 <- c(0:max(c(r1_marg, c1_marg)))
r1c2 <- c(0:max(c(r1_marg, c2_marg)))
r2c1 <- c(0:max(c(c1_marg, r2_marg)))
r2c2 <- c(0:max(c(c2_marg, r2_marg)))


marg_combos <- expand_grid(r1c1, r1c2, 
                           r2c1, r2c2) %>%
 filter(r1c1 + r1c2 == r1_marg & 
         r2c1 + r2c2 == r2_marg &
         r1c1 + r2c1 == c1_marg &
         r1c2 + r2c2 == c2_marg) %>% 
 tibble::rowid_to_column(var = "tab_id") %>% 
 pivot_longer(r1c1:r2c2, names_to = "pos") %>%
 group_by(tab_id) %>% 
 summarize(mat = list(matrix(value, 
                             nrow = 2, ncol = 2, byrow = TRUE))) %>% 
 group_by(tab_id) %>% 
 mutate(fisher = map(mat, fisher.test)) 

marg_combos
#> # A tibble: 12 x 3
#> # Groups:   tab_id [12]
#>    tab_id mat               fisher 
#>     <int> <list>            <list> 
#>  1      1 <int[,2] [2 x 2]> <htest>
#>  2      2 <int[,2] [2 x 2]> <htest>
#>  3      3 <int[,2] [2 x 2]> <htest>
#>  4      4 <int[,2] [2 x 2]> <htest>
#>  5      5 <int[,2] [2 x 2]> <htest>
#>  6      6 <int[,2] [2 x 2]> <htest>
#>  7      7 <int[,2] [2 x 2]> <htest>
#>  8      8 <int[,2] [2 x 2]> <htest>
#>  9      9 <int[,2] [2 x 2]> <htest>
#> 10     10 <int[,2] [2 x 2]> <htest>
#> 11     11 <int[,2] [2 x 2]> <htest>
#> 12     12 <int[,2] [2 x 2]> <htest>


marg_combos_all_in_one_go <- marg_combos %>% group_by(tab_id) %>% mutate(p_Values_super_extracted = pmap(list(fisher), pluck("p.value")))

marg_combos_all_in_one_go
#> # A tibble: 12 x 4
#> # Groups:   tab_id [12]
#>    tab_id mat               fisher  p_Values_super_extracted
#>     <int> <list>            <list>  <list>                  
#>  1      1 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#>  2      2 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#>  3      3 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#>  4      4 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#>  5      5 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#>  6      6 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#>  7      7 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#>  8      8 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#>  9      9 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#> 10     10 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#> 11     11 <int[,2] [2 x 2]> <htest> <dbl [1]>               
#> 12     12 <int[,2] [2 x 2]> <htest> <dbl [1]>

^{Created on 2021-05-10 by the reprex package (v2.0.0)}

which gives me this dataframe with View(marg_combos_all_in_one_go):

system · May 17, 2021, 5:27pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.