Many of my team members are slowly making a switch from base R to tidyverse. To help their transition, I wrote a blog post that maps base R functions to equivalent tidyverse functions. I would really appreciate it if someone could review the list and point out any mistakes/corrections. Thanks!
One small thing in the intro paragraph, you swapped the ‘t’ and ‘i’ in ‘switch’…
Hopefully the table below helps you swtich from base R
A few mostly un-researched notes:
distinctworks on data frames/tibbles, not vectors.
- I would personally put
map, as I think of the former as “collapsing a data frame in a controlled manner” and the latter as “get an output for each input”.
- Depending on your target audience,
rerunmay be helpful.
- Since it doesn’t even seem to be in the index for
purrryet, it’s easy to miss
pluckfor extracting items from lists (and the use of
mapto do it iteratively), but it’s a common enough scenario that seems worth listing.
aggregateis a good general-case
cumsum-like function that could go in the “See Also” for that line.
This always comes up in discussion on SO. Is
do being deprecated in favour of
map @hadley? All we have is a saved tweet from you long ago.
Ah, I vaguely remember that discussion, but I never really updated my coding style to account for it (and then had to deal with a problem that multidplyr was great for parallel processing, even further cementing
do in my mind). I’ll have to remember to try that next time I’m dealing with that kind of problem.
I think list-columns +
map() is easier to use and reason about than
There’s a bigger learning curve (particularly since we don’t have a great guide to all the ideas in one place), but I think the ideas generalise more readily to other domains.
I love reading all these, because I always learn something new. I am
tidyverse only and have been for about 9 months, but I had no idea
if_else existed until I read this. Will switch!
I couldn’t agree more - list-columns has changed the way I execute models against grouped data. From my experiences, it’s cut code bloat and run time down tremendously.
Karl Broman wrote a nice piece in that vein, though it could use a little updating:
Thanks, @rajkorde! This is really helpful. I have been trying to use the tidyverse whenever I can but it’s hard to break old habits when they’re all you know. It’s useful to see so many side-by-side comparisons. I would love to be “tidyverse only” like @rkahne. Maybe this will get me there.
Thank you all so much for your comments and suggestions. I have updated the post with all the recommended fixes… Thanks again!
This is great! What would you think of adding an extra column identifying the package of the tidyverse function, maybe with links to their website where applicable? I’d be happy to help contribute to that.
Thanks @rajkorde, this is such a great resource!
I am in the process of switching from plyr/reshape2 to dplyr/tidyr/purrr, I was wondering if anyone knows of any such table of equivalence between plyr and dplyr?
Not a table, but http://jimhester.github.io/plyrToDplyr/ has parallel code using plyr/reshape and dplyr/reshape2 going through the original plyr examples. It might still be useful to see equivalents of common operations.
At the time I made the page tidyr/purrr did not exist, which is why they don’t appear
I also have side by side code, base vs dplyr, for a set of data aggregation operations here:
I don’t include plyr, but address it in comments. Leaving plyr behind was really really hard for me , but I’ve finally done it.
Like you @jennybryan plyr has been my main coding framework in R for many, many years… I guess what makes it hard for me to switch is that I do a lot of list <–> data.frame operations (ldply being a favourite), so have to learn not only dplyr but also purrr.
Those long time habits are hard to lose!
Loving this comparison, @rajkorde
I’ve started using list columns and
broom to organise my GLMs (where previously I had a named list and was using a separate data frame for the metadata, with the names as keys), and it makes them a lot easier to manage. The syntax with
map is a little bit trickier than (relatively) vanilla dplyr verbs, but the benefits for keeping models and their metadata tidy are amazing.
Worth mentioning most (all?) of tidyverse functions would expect a dataframe as an input.
And on base side I would add followings:
ifelse(…, NA)add or replace with
mtcars[ mtcars$cyl == 4, ] <- NA
base equivalent of
pluck() should be just
[[ (no need to add