Future of rowwise operations: may dplyr::rowwise() become as efficient as purrr::pmap()?

Hi all,

I've noticed that dplyr::rowwise() is back on the table (" rowwise() is no longer questioning", from https://github.com/tidyverse/dplyr/blob/master/NEWS.md).

I am happy about that since the syntax is sleek, but I wonder if there are any reason to believe that rowwise() based workflows could become much faster in the near future?

For now, the small following benchmark based on the non-(fully)-vectorized function dplyr::between() shows that purrr::pmap() remains much more efficient (both in terms of CPU and memory) when datasets get relatively large:

library(tidyverse)
set.seed(1)
iris_big <- as_tibble(iris[sample(1:nrow(iris), 5e+5, replace = TRUE), ])
iris_big$Sepal.Width <- iris_big$Sepal.Width + 2 # for test below not to be just TRUE

test_big <- bench::mark(

  vectorised_between = {iris_big %>%
      mutate(test = Sepal.Width >= Petal.Length & Sepal.Width <= Sepal.Length)},

  pmap_between = {
    iris_big %>%
      mutate(test = pmap_lgl(list(Sepal.Width, Petal.Length, Sepal.Length), between))},

  rowwise_between = {
    iris_big %>%
      rowwise() %>%
      mutate(test = between(Sepal.Width, Petal.Length, Sepal.Length)) %>%
      ungroup()},

  iterations = 10)

test_big
#> # A tibble: 3 x 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 vectorised_between  22.69ms  36.83ms   24.9       24.3MB    17.4 
#> 2 pmap_between          1.17s    1.26s    0.794     19.1MB     4.69
#> 3 rowwise_between       9.83s    10.3s    0.0972   101.9MB     3.73

plot(test_big)

My immediate interest is that I will soon be attempting to convert SPSS people to use R and these people deal with large datasets only. I wonder whether I could spare "purrring" them...

PS: I used here between() only as an example and I do know that many tasks can be vectorized.

3 Likes

Perhaps the answer to my question is yes:

Nothing to add here except for good luck and have fun converting SPSS users :slight_smile: Thanks for opening this topic, it's an interesting one - I too enjoy the rowwise/ungroup workflow.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.