Split-apply-combine approaches



I use Tidyverse a couple of months for now and just want to be clear with my understanding of best practice with split-apply-combine approaches. I used group_by() + do() approach and was satisfied somehow. But I wondered when I saw, that do() is basically deprecated https://bit.ly/2T1LdWn. Then I found this blog post https://bit.ly/2PkUHO5 where author describes purrr approach (a couple of them). Also on datacamp a new course was released - ML in Tidyverse, where a purrr approach is preferable. But it seems like group_by() + do() still in use.

I am confused. What approach should I use, why so and what if i'll be more conservative with group_by () + do()? At least, it has a progress bar.


I think you could be interested in these resource as well

You can do with dplyr and purrr what you could do with group_by() and do.

Why doesn't group_by %>% purr::f() behave the same way as summarise()?

Can it be parallelized?


Purrr syntax has a parallelized companion package

I let you see if it is what you are looking for when mentioning parallelization


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.