Should I move away from do() and rowwise()?

I'm not sure if I understand do() correctly, but maybe it depends on the relative cost of allocation.

While do() iterates calculation with indices of groups without allocation, nest() actually splits the data.frame into many pieces, which needs allocation and thus takes time.

But, in other words, nest() can allocate data.frames that is already split, while do() can't. So, if you do the same calculation over the same data.frame many times, nest() + map() can be faster.

g <- xx %>%                                            
  group_by(x, y)

microbenchmark(                                            
  usedo = {
    do(g, zz = mean_and_sd(.$z))
    do(g, zz = mean_and_sd(.$z))
  },
  usemap = {
    n <- nest(g)
    transmute(n, x = x, y = y, zz = map(data, ~ mean_and_sd(.$z)))
    transmute(n, x = x, y = y, zz = map(data, ~ mean_and_sd(.$z)))
  }, 
  times = 20
)
#> Unit: milliseconds
#>    expr      min        lq      mean    median        uq       max neval
#>   usedo 909.9741 1040.7445 1164.2193 1190.0480 1290.1719 1361.6217    20
#>  usemap 533.2164  651.7122  735.9906  716.0828  803.2196  948.5835    20
1 Like

6 posts were split to a new topic: Is nest() + mutate() + map() + unnest() really the best alternative to dplyr::do()