How I can create a package like dplyr?


#1

Hi everybody,
I come from VietNam and now Rcommunity fastly growing up in here. But i think we just stop in apply package and our weakness is lack of programing R skill. So we can't develope specific package for our country such as clone data from many VietNam API source (security price, film,...), adjust visualization such as ggplot2 to be adequate with our country,.... Recently, i developed my own package names VNDS: https://github.com/phamdinhkhanh/VNDS to serve people who are in VietName financial sector. I want to develope my package more clean code and implement methology from tidyverse enviroment but i don't know how to process it. So i think the first thing i should do is learning the way owner create tidyverse. But i was empty. Should you recommend to me what should i learn or read to discover the tidyverse (about coding inside, i already been firmly about practice this methods). Thanks beforehands!


#2

I think Hadley's Advanced R, and R Packages books (both freely available online at the links below) would be good places to start:

http://r-pkgs.had.co.nz/

You can also, of course, look at the source code itself on GitHub to get an idea about some of the internals.


#3

Mara's recommendation of those two key Hadley books is excellent.

You should approach package design from the perspective of a user. Think of the best (most useful, etc) package's you've used; what makes them great? Try to emulate those best practices.

Good luck!


#4

This is great advice, and more easily ignored than one might expect. Any task worth writing a package for requires some complex coding, but that should be on you, not the user. Your users expect simplicity like

result <- really_complex_task(my_data)

They're unlikely to bother with anything more complex.

Also, it may be fun to mess around with tidy eval, but in most cases:

  • The column names of datasets are already known. This is especially true in subject-specific packages, like for an API. You could even make your own subclasses of data.frame to make sure you know the names.
  • Functions taking a data.frame and column names can usually be rewritten to just take vectors. This also makes them more flexible.
  • If you absolutely need flexibility in working with datasets, I suggest the seplyr package. I find it much easier to program with than rlang's non-standard evaluation model.