question re purrr and pipes

I am stil trying to understand some basics of the purrr package, and was wondering whether someone could explain me why the last command below doesn't work. I understand its something basic, and I have other ways to solve the issue, but it would help to know why it doesn't work.

library(tidyverse)
#> Warning: Paket 'tibble' wurde unter R Version 3.5.2 erstellt
#> Warning: Paket 'readr' wurde unter R Version 3.5.2 erstellt
#> Warning: Paket 'purrr' wurde unter R Version 3.5.2 erstellt

df2 <- structure(list(group = c("B-C", "A,B,C", "B A C"), overlap = list(
  logical(0), c("B", "C"), c("B", "A", "C"))), class = "data.frame", row.names = c(NA, 
                                                                                   -3L))
df2$overlap %>% map(.,length)
#> [[1]]
#> [1] 0
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3
df2$overlap %>% 
  map(.,length)
#> [[1]]
#> [1] 0
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3
df2 %>% 
  mutate(length.overlap=map(overlap, length))
#>   group overlap length.overlap
#> 1   B-C                      0
#> 2 A,B,C    B, C              2
#> 3 B A C B, A, C              3
df2 %>% 
  map(overlap, length)
#> Error in as_mapper(.f, ...): Objekt 'overlap' nicht gefunden

Hi @zoowalk! There're a few things to remember when working out how pipes and purrr interact:

  1. Generally, the pipe operator inserts the thing it's passing (also represented by .) as the first argument to the function. There are some exceptions where that doesn't happen, but it's true in every case you present here.
  2. map() iterates over a list or vector. That could be, for example, a column from a data frame (which is generally a vector but might be a list if you have a list column in your data frame).
  3. It could also be a data frame: data frames are lists of columns, so if you map() over a data frame directly, each element of that iteration is a column.
  4. When you pipe, the argument insertion happens at the top level: if you pipe to mutate(), the data frame is being invisibly inserted as the first argument to mutate, not to any map function inside mutate(). This is why your third example works: the first argument, overlap, is the vector you're iterating over.
  5. But when you pipe directly to map(), the data frame is being invisibly inserted as the first argument to map(). Which could be what you want if you want to operate on each column, but in that case you'd want the function to be specified straight after. Here, overlap is the "second" argument and is interpreted as the function you want to map. Which is why it doesn't work.

Generally, if you want to derive one column from another, mutate() is the way to go, even if you're using map() inside of it. If you absolutely don't want to use mutate(), you can stop the pipe from inserting the argument right at the start with { braces }:

df2 %>%
  { map(.$overlap, length) }

For what it's worth, if I were reading someone else's code, I'd probably prefer the use of mutate() it tells me straight away that you're creating or modifying a data frame column, which helps a lot when you can do just about anything with map()!

I hope that helps!

6 Likes

Excellent. Many thanks, this helps a lot!

1 Like

Or df2 %$% map(overlap, length) if you attach magrittr

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.