Trying to understand pmap()

Andrzej · April 5, 2020, 10:31pm

Hi,
I am still trying to wrap my head around purrr package.
I stumbled upon this example:

l <- list(rnorm(10),
          rnorm(100), 
          rnorm(1000))
pmap_dbl(list(l, 20, TRUE), ~ mean(..1, ..2, ..3))

I want to understand what 20 and TRUE are here for:

pmap_dbl(list(l, 20, TRUE)

and what are ..1, ..2 and ..3 arguments stand for ?

Trying to figure it out I found that:

mean(c(20, TRUE))

mean(c(20, FALSE))

give different results: 10.5 and 10 respectively.

If somebody could guide me, please ?

FJCC · April 5, 2020, 11:43pm

This does not seem like a helpful example of using pmap. It is iterating over the three elements of l with the mean function using l as the first argument, 20 and the second argument and TRUE as the third.

Your example of mean(c(20, TRUE)) is completely different. it calculates the mean of 20 and TRUE where TRUE = 1.

library(purrr)
l <- list(rnorm(10),
          rnorm(100), 
          rnorm(1000))
pmap_dbl(list(l, 20, TRUE), ~ mean(..1, ..2, ..3)) 
#> [1]  0.08393619  0.15101796 -0.03647648
#mean takes three arguments, x, trim and na.rm, see the help for mean
mean(l[[1]], 20, TRUE)
#> [1] 0.08393619
mean(l[[2]], 20, TRUE)
#> [1] 0.151018
mean(l[[3]], 20, TRUE)
#> [1] -0.03647648

#passing 20 to the trim argument of mean is equivalent to passing 0.5, see ?mean
pmap_dbl(list(l, 0.5, TRUE), ~ mean(..1, ..2, ..3)) 
#> [1]  0.08393619  0.15101796 -0.03647648

^{Created on 2020-04-05 by the reprex package (v0.3.0)}

joels · April 6, 2020, 12:05am

That example seems unnecessarily complicated and confusing. pmap (parallel map) allows you to map over any number of arguments. So, for example, the code below runs rnorm three times, first with n=3, mean=2, and sd=1, then with n=6, mean=4, sd=3, and so on:

l = list(n=c(3,6,9), 
         m=c(2,4,6),
         s=c(1,3,5))

pmap(l, rnorm)

[[1]]
[1] 2.900625 2.851770 2.727715

[[2]]
[1] 6.209506 2.943611 6.116547 7.901074 4.114756 1.062149

[[3]]
[1]  9.968806  9.932534  4.447684 14.494424  2.027031  7.742189
[7] -5.327005  5.188974 11.654325

pmap automatically entered the arguments in order into rnorm. If I wanted to change the order, I would use ..1 to refer to the first element of l, ..2 to refer to the second element, and so on:

pmap(l, ~rnorm(..3, ..1, ..2))

[[1]]
[1] 2.088908

[[2]]
[1] 2.403335 8.907356 2.762236

[[3]]
[1] 10.6025107 -1.4235823  0.5314492  6.2786926  2.7870523

You can get the same result as in your example with map as follows:

map_dbl(l, mean, 20, TRUE)

mean has three arguments: x, which is the vector of values for which you want the mean. trim, which will trim a fraction of the values, and na.rm which will strip missing values before calculating the mean. In the map_dbl example I gave above, map_dbl iterates over each element of l. But for each iteration, we want trim=20 and na.rm=TRUE. To make that happen, we put those arguments after the mean function and they get entered by position into mean. If we don't want to have to worry about entering arguments in the right order, we can name them and they will get passed correctly:

map_dbl(l, mean, na.rm=TRUE, trim=20)

In the example you gave, list(l, 20, TRUE) is a three-element list. The first element is l, which is itself a three-element list and the second and third elements are 20 and TRUE. pmap operates in parallel and recycles 20 and TRUE for each of the three calls it makes to mean. Not only can this be done more simply with map_dbl, using 20 for trim is confusing, because trim can never be greater than 0.5. When a value greater than 0.5 is entered, mean silently reduces it to 0.5.

technocrat · April 6, 2020, 3:19am

Understanding arguments to functions is something that I had a lot of trouble with (and shouldn't have in retrospect).

One of the hard things to get used to in R is the concept that everything is an object that has properties. Some objects have properties that allow them to operate on other objects to produce new objects. Those are functions.

Think of R as school algebra writ large: f(x) = y, where the objects are f, a function, x, an object (and there may be several) termed the argument and y is an object termed a value, which can be as simple as a single number (aka an atomic vector) or a very packed object with a multitude of data and labels.

And, because functions are also objects, they can be arguments to other functions, like the old g(f(x)) = y. (Trivia, this is called being a first class object.)

Although there are function objects in R that operate like control statements in imperative/procedural language, they are best used "under the hood." As it presents to users interactively, R is a functional programming language. Instead of saying

take this, take that, do this, then do that, then if the result is this one thing, do this other thing, but if not do something else and give me the answer

in the style of most common programming languages. However, R allows the user to say

use this function to take this argument and turn it into the value I want for a result

Every function has a signature

pmap_dbl(.l, .f, ...)

The first, .l, is a

.l A list of vectors, such as a data frame. The length of .l determines the number of arguments that .f will be called with. List names will be used if present.

.f A function, formula, or vector (not necessarily atomic).
\cdots If a formula, e.g. ~ .x + 2, it is converted to a function
for more arguments, use ..1, ..2, ..3 etc

Breaking down

pmap_dbl(list(l, 20, TRUE), ~ mean(..1, ..2, ..3))

\ldots is represented by list(l, 20, TRUE) as .l and ~ mean(..1, ..2, ..3)) as .f. What does it do?

Let's take the .l argument first.

str(list(l, 20, TRUE))
List of 3
 $ :List of 3
  ..$ : num [1:10] -0.638 -0.287 1.057 -2.519 -1.835 ...
  ..$ : num [1:100] -0.456 1.446 0.448 0.855 -0.106 ...
  ..$ : num [1:1000] 0.159 0.573 0.587 -0.267 0.352 ...
 $ : num 20
 $ : logi TRUE

It's a list with a list of three elements, 20 and TRUE. The list within the list is a list of lists.

str(l)
List of 3
 $ : num [1:10] -0.638 -0.287 1.057 -2.519 -1.835 ...
 $ : num [1:100] -0.456 1.446 0.448 0.855 -0.106 ...
 $ : num [1:1000] 0.159 0.573 0.587 -0.267 0.352 ...

each of those is aso a list

 str(l[1])
List of 1
 $ : num [1:10] -0.638 -0.287 1.057 -2.519 -1.835 ...

We're now almost down to something we can take a mean of.

mean(l[[1]])
[1] -0.6441985

But that's not the argument to pmap_dbl that's being used

mean(c(l[[1]],20,TRUE))
[1] 1.213168

which is nicely abbreviated by ..1.

This is a long-winded way of saying that objects in R can easily go very deep, and it pays to take a pause to do this knd of anlaysis.

Andrzej · April 6, 2020, 6:03am

Thank you @FJCC, @joels, @technocrat for your detailed explanations.
This is much better to read:

map_dbl(l, mean, na.rm=TRUE, trim=20)

than

pmap_dbl(list(l, 20, TRUE)

where arguments are not explicitly written down.
I didn't know that TRUE in short could stand for na.rm = TRUE and that 20, means the same as trim=20.

Thank you again I will study now what you wrote in order to comprehend it.
kind regards,
Andrzej

Andrzej · April 6, 2020, 6:24am

When you divided this into separate one by one calls:

I tried this:

pmap_dbl(list(l, 20, TRUE), ~ mean(..1=l[[1]], ..2=l[[2]], ..3=l[[3]]))

but it returned an error:

What kind of mistake did I do here ?

joels · April 6, 2020, 7:11am

pmap_dbl(list(l, 20, TRUE), ~ mean(x=..1, trim=..2, na.rm=..3))

..1 refers to the first element of list(l, 20, TRUE), which is l.
..2 refers to the second element of list(l, 20, TRUE), which is 20.
..3 refers to the third element of list(l, 20, TRUE), which is TRUE.

Note also that the mean function's arguments are x, trim, and na.rm, rather than ..1, ..2, and ..3.

Andrzej · April 6, 2020, 9:13am

Thank you @joels,
So if I understand it right, this function:

calculates mean() with options trim = 20 and na.rm = TRUE for each element of this list:

Is it correct ?
If so, I think that would be nicer to create that list like this:

l <- list(a = rnorm(10),
           b= rnorm(100), 
           c= rnorm(1000))

and I checked it worked, but for my learning purposes this is more convenient to divide it into smaller pieces.

One more question, sometimes I can see a code when in pmap()
the users use { } curly braces. Would it be an option to use it in my example as well or is this only for usage of the tidyverse pipes and purrr together ?

system · April 27, 2020, 9:13am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.