How I can create a package like dplyr?


Hi everybody,
I come from VietNam and now Rcommunity fastly growing up in here. But i think we just stop in apply package and our weakness is lack of programing R skill. So we can't develope specific package for our country such as clone data from many VietNam API source (security price, film,...), adjust visualization such as ggplot2 to be adequate with our country,.... Recently, i developed my own package names VNDS: to serve people who are in VietName financial sector. I want to develope my package more clean code and implement methology from tidyverse enviroment but i don't know how to process it. So i think the first thing i should do is learning the way owner create tidyverse. But i was empty. Should you recommend to me what should i learn or read to discover the tidyverse (about coding inside, i already been firmly about practice this methods). Thanks beforehands!


I think Hadley's Advanced R, and R Packages books (both freely available online at the links below) would be good places to start:

You can also, of course, look at the source code itself on GitHub to get an idea about some of the internals.


Mara's recommendation of those two key Hadley books is excellent.

You should approach package design from the perspective of a user. Think of the best (most useful, etc) package's you've used; what makes them great? Try to emulate those best practices.

Good luck!


This is great advice, and more easily ignored than one might expect. Any task worth writing a package for requires some complex coding, but that should be on you, not the user. Your users expect simplicity like

result <- really_complex_task(my_data)

They're unlikely to bother with anything more complex.

Also, it may be fun to mess around with tidy eval, but in most cases:

  • The column names of datasets are already known. This is especially true in subject-specific packages, like for an API. You could even make your own subclasses of data.frame to make sure you know the names.
  • Functions taking a data.frame and column names can usually be rewritten to just take vectors. This also makes them more flexible.
  • If you absolutely need flexibility in working with datasets, I suggest the seplyr package. I find it much easier to program with than rlang's non-standard evaluation model.