User experience of package management with RStudio Package Manager

We are using an evaluation version of RStudio's Package Manager, which sets up different package sets and provides a URL for installation... All very nicely.

HOWEVER, from a user's perspective, when they set up a project how should they ensure that the packages they need for that project are available? Let's assume that two different projects might need different package sets - one "latest / greatest" and one "validated". Also assume for starters that I'm not using Docker - since I think Docker would alleviate a lot of these issues.

I know packrat will help to keep a snapshot of packages needed for an analysis / project. If two projects use the same set of packages, will packrat be able to acknowledge these are from the same package set e.g. "validated" and reuse package sets, rather than re-downloading? What other options are out there (and which recommended) for package management on a project to project basis?

Could RStudio IDE have a project-specific setting for Package repo URL (rather than global) so that each project could point to a specific RSPM repo?

2 Likes

Mike this is a great question! We are working on packrat 2.0 which will be available in the coming months, as well as website to go along with it that addresses these questions.

You're right, you have to manage the libraries in addition to the repositories. There are a few ways to do this:

  • One option is to use packrat (or the future packrat 2.0 project). This puts the management onus on users, and relies on project-specific libraries. You can also achieve this goal without packrat:
  1. Create a .Rprofile in the project directory
  2. Have the .Rprofile contain two commands, one to set the project-specific repo and a second to set the project-specific library (using options('repos') and .libPaths respectively)
  • Another common option for larger organizations is to tie the system library for a specific version of R to a frozen or validated repo in Package Manager. This ensures every user on that version of R has a "shared baseline". Admins can associate different R installations (and system libraries) with different repos using Rprofile.site, and then users would "context switch" by picking from the R installations available.

We've also seen organizations mix these two models - new users rely on the shared baseline while advanced users manage project-specific libraries. This chart demonstrates how these strategies compare and who is responsible:

https://colorado.rstudio.com/rsc/content/2154/viewer-rpubs-175c36566f2b8.html

We're laying quite a bit of groundwork this spring to make these use cases even easier - stay tuned.

3 Likes

Mike, I'll also agree that Docker fits in nicely here. Many organizations assume Docker will be a silver bullet that results in reproducible work, which isn't true. See: https://medium.com/@sean_50535/rstudio-conf-2019-the-theme-you-may-have-missed-a3e2993a8121

BUT, Docker does make it very easy to isolate projects, so Docker + a Package Management Strategy (like the RSPM repositories) works really well. Essentially, Docker removes the need to worry about project-specific libraries, because the whole environment is project-specific. The repository (or package management strategy) then takes care of ensuring you always get the same packages when you re-create the image.

I'm very excited about packrat 2.0 (saw it mentioned recently in a github issue). I really hope that the combination of packrat and pak can enable individual projects to specify package versions at lower time/size costs than is possible right now.

I've been in a git changes loop with a colleague, where because we have slightly different versions of unknown packages, every time one of us re-runs the project, it results in slightly different final results. I know I need to get better about specifying dependencies for my analysis.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.