Switching from base R to tidyverse


#1

Many of my team members are slowly making a switch from base R to tidyverse. To help their transition, I wrote a blog post that maps base R functions to equivalent tidyverse functions. I would really appreciate it if someone could review the list and point out any mistakes/corrections. Thanks!


#2

One small thing in the intro paragraph, you swapped the ‘t’ and ‘i’ in ‘switch’…

Hopefully the table below helps you swtich from base R


#3

A few mostly un-researched notes:

  • distinct works on data frames/tibbles, not vectors.
  • I would personally put do along with summarize, not map, as I think of the former as “collapsing a data frame in a controlled manner” and the latter as “get an output for each input”.
  • Depending on your target audience, replicatererun may be helpful.
  • Since it doesn’t even seem to be in the index for purrr yet, it’s easy to miss pluck for extracting items from lists (and the use of map to do it iteratively), but it’s a common enough scenario that seems worth listing.
  • aggregate is a good general-case cumsum-like function that could go in the “See Also” for that line.

#4

This always comes up in discussion on SO. Is do being deprecated in favour of map @hadley? All we have is a saved tweet from you long ago.


#5

Ah, I vaguely remember that discussion, but I never really updated my coding style to account for it (and then had to deal with a problem that multidplyr was great for parallel processing, even further cementing do in my mind). I’ll have to remember to try that next time I’m dealing with that kind of problem.


#6

I think list-columns + map() is easier to use and reason about than rowwise() + do().

There’s a bigger learning curve (particularly since we don’t have a great guide to all the ideas in one place), but I think the ideas generalise more readily to other domains.


#7

I love reading all these, because I always learn something new. I am tidyverse only and have been for about 9 months, but I had no idea if_else existed until I read this. Will switch!


#8

I couldn’t agree more - list-columns has changed the way I execute models against grouped data. From my experiences, it’s cut code bloat and run time down tremendously.


#9

Karl Broman wrote a nice piece in that vein, though it could use a little updating:

hipsteR: re-educating people who learned R before it was cool


#10

Thanks, @rajkorde! This is really helpful. I have been trying to use the tidyverse whenever I can but it’s hard to break old habits when they’re all you know. It’s useful to see so many side-by-side comparisons. I would love to be “tidyverse only” like @rkahne. Maybe this will get me there.


#11

Thank you all so much for your comments and suggestions. I have updated the post with all the recommended fixes… Thanks again!


#12

This is great! What would you think of adding an extra column identifying the package of the tidyverse function, maybe with links to their website where applicable? I’d be happy to help contribute to that.


#13

Thanks @rajkorde, this is such a great resource!

I am in the process of switching from plyr/reshape2 to dplyr/tidyr/purrr, I was wondering if anyone knows of any such table of equivalence between plyr and dplyr?


#14

Not a table, but http://jimhester.github.io/plyrToDplyr/ has parallel code using plyr/reshape and dplyr/reshape2 going through the original plyr examples. It might still be useful to see equivalents of common operations.

At the time I made the page tidyr/purrr did not exist, which is why they don’t appear :slight_smile:


#15

I also have side by side code, base vs dplyr, for a set of data aggregation operations here:

https://jennybc.github.io/purrr-tutorial/bk01_base-functions.html

I don’t include plyr, but address it in comments. Leaving plyr behind was really really hard for me :disappointed_relieved:, but I’ve finally done it.


#16

@jimhester @jennybryan Thank you both for these links!

Like you @jennybryan plyr has been my main coding framework in R for many, many years… I guess what makes it hard for me to switch is that I do a lot of list <–> data.frame operations (ldply being a favourite), so have to learn not only dplyr but also purrr.

Those long time habits are hard to lose!


#17

Loving this comparison, @rajkorde :smiley:

I’ve started using list columns and broom to organise my GLMs (where previously I had a named list and was using a separate data frame for the metadata, with the names as keys), and it makes them a lot easier to manage. The syntax with map is a little bit trickier than (relatively) vanilla dplyr verbs, but the benefits for keeping models and their metadata tidy are amazing.


#18

Worth mentioning most (all?) of tidyverse functions would expect a dataframe as an input.

And on base side I would add followings:

  • ifelse(is.na(…), …) add complete.cases() too.
  • ifelse(…, NA) add or replace with mtcars[ mtcars$cyl == 4, ] <- NA

#19

The base equivalent of pluck() should be just [[ (no need to add lapply() and unlist()) or getElement().