colMeans(df) is same map_dbl(df,mean) ?

Rsky · April 6, 2021, 7:31am

I'm learning purrr.
I now know that furrr executes functions on multiple cores(because fast).
I don't understand what Benefits of purrr.

For example, if I had this data,


df <- tibble(
  a = rnorm(100000000),
  b = rnorm(100000000),
  c = rnorm(100000000),
  d = rnorm(100000000)
)

They all have the same execution time.

library(tictoc)
library(robustbase)

output <- vector("double", length(df))
for (i in seq_along(df)) {
  output[[i]] <- median(df[[i]])
}

colMedians(as.matrix(df))

map_dbl(df, median)

Can you tell me the geeky differences inside these calculators?

I've heard that map is implemented in C and is a bit faster, but can't you feel this level of speed?
The other possible advantage is that the code is easier to read once you get used to it?

thank you

martin.R · April 6, 2021, 12:27pm

colMeans() is written in C, you can check the code.

There is essentially no difference between using colMeans(df) and purrr::map_dbl(df, mean) except that the latter has a tiny bit of additional overhead.

Rsky · April 8, 2021, 1:57pm

@martin.R

Thank you !

system · April 29, 2021, 1:57pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.