Understanding wrappers in purrr

purrr

#1

Hello there,

I am looking at Jenny's nice tutorial here https://github.com/jennybc/row-oriented-workflows/blob/master/ex06_runif-via-pmap.md and I am a bit puzzled by one of the examples here.

In particular, I do not understand what's going on with this snippet

A: Write a wrapper around  `runif()`  to say how df vars <–> runif args.

## wrapper option #1: 
## ARGNAME = l$VARNAME 
my_runif <- function(...) {
  l <- list(...)
  runif(n = l$alpha, min = l$beta, max = l$gamma)
}

when I use pmap in pmap(foofy, my_runif), what is the input argument of my_runif? is it the dataframe? what is the ... doing here? Also, why do we need to convert as list in the second step?

Thanks for your help!


#2

pmap iterates in parallel over the items in a list. Since a data frame is a list of variables, when a data frame is passed to pmap, it iterates over its rows, passing each to the function in turn.

In this case, my_runif is a function that accepts anything (...), and then looks for arguments within what it has been passed named alpha, beta and gamma, and passes them to runif's n, min, and max parameters, respectively. This will get called for each row of foofy, using its values as arguments.

As for the second step of collecting the dots into a list, it's not strictly necessary (there are other ways of accessing or passing on the dots), but is common when using dots in a function, as it forces evaluation and makes them easy to subset by name, as the result is an ordinary list of the arguments passed to ....

Before they're collected with list (or c, types permitting), dots don't behave like a single object, but rather as all the objects passed into the dots. That makes it really easy to forward them on to another call, but hard to do much else with. Collecting dots really is redirection—all the arguments passed to ... are passed in directly to list, which is why the result has a length of three in this case instead of one, like you'd get by passing any other single symbol into list.


#3

Thanks Alistaire!

As for the second step of collecting the dots into a list, it's not strictly necessary (there are other ways of accessing or passing on the dots), but is common when using dots in a function, as it forces evaluation and makes them easy to subset by name, as the result is an ordinary list of the arguments passed to

what are these other ways of accessing the dots? I tried to use the same syntax on a simple dataframe, and it does not seem to split the dataframe into a named list.

test <- data_frame(text = c('alistaire'),
                   value = c(1))

here list() returns a mere list

> list(test)
[[1]]
# A tibble: 1 x 2
  text      value
  <chr>     <dbl>
1 alistaire     1

Am I missing something?
Thanks again!


#4

Programming with dots

Collecting dots

Dots only work within functions, as they're defined by arguments passed to a function. To see what they look like, you have to collect them, e.g.

collect_dots <- function(...){
    list(...)
}

collect_dots('a')
#> [[1]]
#> [1] "a"
str(collect_dots(3:5, cos(pi), list('hi')))    # str prints lists more compactly
#> List of 3
#>  $ : int [1:3] 3 4 5
#>  $ : num -1
#>  $ :List of 1
#>   ..$ : chr "hi"

Calling list on the dots collects them into a single object, evaluating them in the process (note that cos(pi) is now -1).

Accessing dots

An alternative to collection is the numeric accessors. ..1 refers to the first argument to the ... parameter, ..2 to the second, etc. A ...elt() function was recently added to R, where ...elt(2) is equivalent to ..2. A ...length() function was also added, returning the number of arguments to .... For documentation, see ?dots.

second <- function(...){
    message('2nd of ', ...length())
    ..2
}

second(1:4, 'b', tan(0))
#> 2nd of 3
#> [1] "b"

What can be useful about this notation is that it only evaluates the argument referred to, not everything, like list does. This matters, because sometimes you don't want a long-running argument to evaluate. For example the first of these takes 3 seconds because Sys.sleep(3) gets evaluated, whereas the second is effectively instantaneous because it never gets called:

system.time(collect_dots(Sys.sleep(3), 'hlo'))
#>    user  system elapsed 
#>   0.000   0.000   3.003
system.time(second(Sys.sleep(3), 'howdy'))
#> 2nd of 2
#>    user  system elapsed 
#>   0.003   0.000   0.003

This is consistent with functions' treatment of parameters, which are only evaluated if they are referred to in the function. A function that doesn't use any parameters will never evaluate what you pass it, e.g.

one <- function(x) 1
one(stop("Error!"))
#> [1] 1

Collecting dots without evaluation

There are ways to collect dots without evaluating them, but this steps into operating on the language, which is a more advanced topic.

A quick example of collecting dots without evaluation (ignore if you like)
collect_dots_2 <- function(...) substitute({...})
str(collect_dots_2('hi', 2, tan(pi)))
#>  language {  "hi"; 2; tan(pi) }

Using alist instead of braces is more practical for operating on the calls, but the above illustrates what's happening better. ?match.call also collects but does not evaluate dots (as part of the whole call).

Passing dots to another function

Dots are not evaluated if they're passed directly to another function, either (though they're usually evaluated by that function). Because they're not collected, that allows them to be spliced into the parameters of the function they're passed to. For example, the following function is mean but with na.rm = TRUE as the default. Because everything is passed through ..., I can still pass a trim argument:

mean_without_NAs <- function(..., na.rm = TRUE){
    mean(..., na.rm = na.rm)
}
mean_without_NAs(c(0, NA, 47, 94))
#> [1] 47
mean_without_NAs(c(1, 2, 3, 100000), trim = 0.25)
#> [1] 2.5

This splicing behavior shows that collecting dots is really a special case of passing them in which they're passed to a function that assembles them into an object like list or c. Thus collect_dots above is effectively just an alias for list.

Note that when calling mean_without_NAs that we still have to collect the values we'd like the mean of with c, as mean takes the value of its x parameter. We could make a version of mean that accepts dots (like sum) by collecting the dots in the function (here with c instead of list, as mean takes a vector, not a list). To still access the other parameters, they now have to be added to the wrapper function explicitly, as the dots are now passed to c for collection instead of on to mean.

mean_of_dots <- function(..., trim = 0, na.rm = FALSE){
    mean(c(...), trim = trim, na.rm = na.rm)
}
mean_of_dots(1, 5, 10, 47)
#> [1] 15.75
mean_of_dots(1, 3, NA, 5, na.rm = TRUE)
#> [1] 3

What pmap does

What a data frame is

To understand what purrr::pmap does when applied to a data frame, you have to think of the data frame as a list. In fact, a data frame is a list, with a few restrictions and a bit of fanciness like rownames. To see the underlying list, call unclass on a data frame:

library(tidyverse)

some_data <- data_frame(
    x = 1:2, 
    y = c('a', 'b')
)

str(unclass(some_data))
#> List of 2
#>  $ x: int [1:2] 1 2
#>  $ y: chr [1:2] "a" "b"
#>  - attr(*, "row.names")= int [1:2] 1 2

You can call pmap on a non-data frame list of this same structure, and you'll get the same result—pmap doesn't care about the class, only the structure.

What pmap gets passed

To see what goes into the function passed to pmap, if we pass list as that function, it will collect the arguments which will be spliced into whatever other function you pass pmap, so each element of the resulting list is a set of parameters that will be called:

some_data %>% 
    pmap(list) %>% 
    str()
#> List of 2
#>  $ :List of 2
#>   ..$ x: int 1
#>   ..$ y: chr "a"
#>  $ :List of 2
#>   ..$ x: int 2
#>   ..$ y: chr "b"

(If you like, you can think of purrr::transpose(some_data) is a shortcut for pmap(some_data, list).)

Calling pmap on functions that take dots

To pass the data in some_data through pmap to a function that does more than list, let's try paste. Since paste always returns a character vector, we'll use the pmap_chr version, which will simplify the resulting list to a character vector for us:

some_data %>% pmap_chr(paste)
#> [1] "1 a" "2 b"

Because some_data has two rows (each element in the list is length two), paste is getting called twice and the whole call is equivalent to

c(paste(1, 'a'), paste(2, 'b'))
#> [1] "1 a" "2 b"

This doesn't do anything particularly useful in this case, but use-cases certainly exist.

Calling pmap on functions with named parameters

Also note that list and paste both themselves accept dots into which the arguments are getting spliced. If the function you're mapping does not accept data through dots, the names matter, as the arguments are passed in with their names and thus picked up by the corresponding parameters like how in

do.call(mean, list(x = c(2, NA, 47), na.rm = TRUE))
#> [1] 24.5

TRUE gets passed to na.rm, not trim, despite the fact that trim is the second parameter, because the argument is named. Thus,

list(
    x = list(1:5, c(1, NA)), 
    na.rm = c(TRUE, FALSE)
) %>% 
    pmap_dbl(mean)
#> [1]  3 NA

If the names of the data frame or list don't line up with the parameter names, you won't get what you want unless you rename in some fashion. This is what the part of the link the original post mentioned was about.


#5

really cool, man, I love it. Thanks!


#6

This is great, @alistaire!
I think you've covered what's in this, but Lionel recently wrote up a section on The ... argument here:


#7

thank you mara this is really cool!