Dplyr: Alternatives to rowwise

jmluther · January 7, 2019, 10:11pm

Sorry to jump in a long-dead thread, but this is clearly an important topic, as several SO questions addressing are highly "liked" e.g, in this SO question and also because hadley seems to be "questioning" the best approach in this GH issue
I've read through several threads and it seems it's prefered not to use rowwise, but pmap instead.

As solely an end-user, this approach is not nearly as intuitive as the rowwise approach (to me at least) for a few reasons.

The pmap help and examples are not very informative (lumped into purrr::map et al) (as pointed out below as well).
pmap_... doesn't quite work the same as other map_ syntax (my attempts below)
I don't understand why there can't be a mutate_rowwise-type option, so the underlying data grouping is not altered?

I'm sure much of this is just my limited understanding of the internals, but I've managed to maneuver the purrr framework much easier than pmap for some reason.

library(tidyverse)
mtcars %>% as_tibble() %>% 
  mutate(new_mean_var = mean(c(vs, am, gear, carb)),
         new_mean_pmap = pmap_dbl(.l = list(vs, am, gear, carb), mean), # NO
         new_mean_pmap_attempt2 = pmap_dbl(.l = list(vs, am, gear, carb), ~mean(c(vs, am, gear, carb))), # NO
         new_mean_pmap_attempt3 = pmap_dbl(.l = list(vs, am, gear, carb), function(x,y,z, zz) mean(c(x,y,z, zz))))  # YES

mishabalyasin · January 7, 2019, 10:21pm

You are right, pmap is the most confusing mapping operator to me as well. I've fixed your examples to show how you can still use pmap:

library(tidyverse)
mtcars %>% as_tibble() %>% 
  mutate(new_mean_var = mean(c(vs, am, gear, carb)),
         #new_mean_pmap = pmap_dbl(.l = list(vs, am, gear, carb), mean), # NO
         new_mean_pmap_attempt2 = pmap_dbl(.l = list(vs, am, gear, carb), ~mean(c(...))),
         new_mean_pmap_attempt3 = pmap_dbl(.l = list(vs, am, gear, carb), function(...) mean(c(...)))) %>%
  select(starts_with("new_mean"))
#> # A tibble: 32 x 3
#>    new_mean_var new_mean_pmap_attempt2 new_mean_pmap_attempt3
#>           <dbl>                  <dbl>                  <dbl>
#>  1         1.84                   2.25                   2.25
#>  2         1.84                   2.25                   2.25
#>  3         1.84                   1.75                   1.75
#>  4         1.84                   1.25                   1.25
#>  5         1.84                   1.25                   1.25
#>  6         1.84                   1.25                   1.25
#>  7         1.84                   1.75                   1.75
#>  8         1.84                   1.75                   1.75
#>  9         1.84                   1.75                   1.75
#> 10         1.84                   2.25                   2.25
#> # … with 22 more rows

^{Created on 2019-01-07 by the reprex package (v0.2.1)}
BTW, your new_mean_var is not correct, as you can see.

jdlong · January 7, 2019, 10:30pm

yeah I'm pretty partial to rowwise myself. I had written up the pmap solution for The R Cookbook 2nd Edition and technical reviewers just hated it. Found it really hard to grok. So I'm rolling back to rowwise.

@romain and @davis have been doing some interesting work with Rap:

I've not taken time to work with Rap, but it looks like a promising alternative to pmap for rowwise operations.

jmluther · January 8, 2019, 2:16pm

Thanks- sorry- I meant for it to be obvious that the first attempt (new_mean_pmap) provided results, but incorrect ones, but didn't show my results.

I like the ... syntax, and will use from hereon, but I have not seen this example before(!). More case-use examples are needed in the help files (which should be dedicated and specific to pmap).

jmluther · January 8, 2019, 2:22pm

thanks, and glad to hear that people smarter than me are trying to come up with a better solution.
The tidyverse approach is so popular because it is intuitive, clear, and code is readable. The pmap approach does not quite fit (in current form), imo.
rowwise is very clear and does what I need, but I recognize that it may alter the data_frame attributes (or some issue like that)