Package development and source control best practices



Dear all,
I’m hoping to pick the collective brain with regards to source control and r packages/products. In short, my team is developing a slew of r-based products including APIs, shiny dashboards, markdown rapports etc. Many of these rely on internal packages. For products that are consumed in production, we have separate dev, test and prod environments. We use git.

What would be a good design pattern in order to support this, in terms of SCM? Separate branches for each environment, eg to track which code is currently in test, prod, etc. What about package/api versions? Should one use tags? Or keep separate branches for all released versions?

There seems to be many ways to do this. Any insights are appreciated.


JW - what are you using to host these R-based data products?

For RStudio Connect, we've recently written about deployment workflows that use Git

We recommend hosting internal packages in a CRAN-like repository and using versioned releases that your data products can pull in. Connect does this management automatically using packrat, and RStudio Package Manager makes the process of going from internal packages in Git to a CRAN-like repository easy. However, you could also use a similar process for dependency management using packrat on your own, and there are open source options for creating CRAN-like repositories.

What do you mean by API versions? Are you referring to R functions hosted as APIs, e.g. using the plumber package? Typically in this case the versioning is done in the path; e.g. /v1/model or /latest/model. In Connect the path can be controlled and updated using a vanity URL, alternatively you could use a filter in plumber, or even a proxy like nginx, to direct requests to appropriate model versions.


Thanks for following up on this.
Since I wrote the original post, we've landed on this setup, which may be helpful to others

  • Packages are delivered from RSPM, using a version numbering scheme that allows for stable production packages, as well as test and development versions. This was not easily done before using a CRAN-like repo.

  • APIs are delivered through Connect/Plumber with versioning through the path, just as Sean mentioned above. This allows decoupling of frontends from the latest API-version, allowing us to do minor non-interface-breaking updates without having to annoy the frontend teams.

  • R-codebase is held in git branches, one brach for each segment in the prodution chain (DEV -> TEST -> PROD). Tags are used for versioning.

This setup seems robust. Some drawbacks compared to a docker-based setup vs. Connect is, that we can control linux OS dependencies, which could be a problem down the road. Right now, we're trying to work around packages with system dependencies, but that can be challenging.