Leveraging RSPM in renv-powered Docker builds

I'm using renv to manage the specific versions of packages supporting production R code, which is built into rocker-based docker images that are ultimately deployed. In my build logs, I see our initial install2.R --deps=TRUE remotes calls that happen before full package restoration hits the Rstudio Package Manager URL and install binaries super fast. :smile: But later calls to restore the exact versions of libraries needed via renv::restore() downloads source from CRAN, which is super slow. :frowning_face:

My renv.lock file references CRAN as the repository, which is how my local dev environments are setup. I'd like not to have devs switch from CRAN to RPSM for daily dev work, as that introduces some friction with new package versions, etc. Is it possible to set local dev environments to use CRAN for normal work and locking library versions, but have docker prefer RSPM when restoring renv environments? Looking over https://github.com/rstudio/renv/issues/430, I suspect this just isn't possible right now, but given that guides exist for using renv with Docker, I'm hoping I'm just misreading/mis-understanding how renv works.

Thanks in advance!

What friction do you anticipate with public package manager? It's essentially a CRAN mirror, with quite a short delay from the CRAN upstream.

You could, in principle, go and edit the renv lockfile and achieve your desired outcome, but you may then still fall foul of the short time delay.

1 Like

Mostly around developers having to change their workstation configs consistently, which they are loathe to do. :wink: To be honest, I've not tested switch to RPSM for my own use daily use, so I should give that a try. I was expecting renv to be able to transparently redirect to another source for finding packages, but that sounds like a miss-understanding on my part!

Getting fast binary builds seems like a pretty good incentive to change one line of configuration in your .Renviron file!

At the risk of going off topic, is this an .Renviron setting? If I go the suggested RSPM route of going into Preferences:Packages outside an active project and configure 'https://packagemanager.rstudio.com/all/latest' as my URL, then create a new project and activate renv, install readr, put that into a dummy script, and snapshot, I still get CRAN as the repository in the renv.lock and CRAN in the individual packages in the lockfile.

Managing repositories in RStudio Desktop when renv is in the mix seems way more complicated than I would have expected. If you set via the GUI preferences, that makes a setting in .config/rstudio/rstudio-prefs.json, which causes RStudio Desktop (1.4.x) to complain about settings being made outside of preferences (which is weird in and of itself). That setting gets ignored by renv, with CRAN being taken as the default.

This could be set with an Rprofile setting options(repos), but renv creates its own .Rprofile so a user level setting would get ignored. Setting via Rprofile.site might be an option, but that requires created/modifying files in each user's R_HOME/etc, which is pretty non standard for many users used to working with files in their local USER_HOME directories.

TL;DR - configuring repository links with renv in a way that doesn't require devs to set custom options for every project they create seems tricky (or at least not well documented).

Happy to be corrected if I'm making this way too hard on myself somehow!

In projects using renv, the repositories used in a project are normally encoded into the project lockfile, at renv.lock. If this file exists, renv will read it and use it to set the active repositories for the project, and this will generally override any other configured repositories. In short, renv.lock becomes the "source of truth", and other settings are generally ignored.

If you need to update the lockfile stored in the renv lockfile, you can run:

options(repos = <...>)

Or, you can manually edit the lockfile itself -- it's just JSON after all.

If you'd like to override the repositories used by renv during restore, you can set:

RENV_CONFIG_REPOS_OVERRIDE = https://packagemanager.rstudio.com/cran/latest

as an environment variable in the appropriate place. This will override the repositories encoded in the renv lockfile.

Thanks, Kevin!

That looked promising, but when I set the override for the latest URL, my renv.lockfile requests for older packages (such as lifecycle 0.2.0) fail with a package not found error. I take it RPSM (at the latest endpoint) only tracks the (near) latest release of packages. While it looks super fast, if I can't restore arbitrary versions that defeats the purpose of renv + docker for my use case.

I suspect the Right Way(tm) of doing this is a setting a default for all developers to use RPSM for all daily work, thereby ensuring RSPM is captured in the lock file, though my previous tests suggest that such a configuration is not straightforward. This use case of renv + docker + rpsm seems like a core one, so I'm sure I'm a victim of my own ignorance here. Surely this isn't as complicated as it seems...

Worth noting that RSPM definitely does store the old versions of packages (https://packagemanager.rstudio.com/client/#/repos/1/packages/lifecycle - scroll to the bottom).

It's possible to configure your own RSPM in such a way that prevents old versions from being seen, but the cran source definitely behaves in the same way that CRAN does.

I'm definitely not a thought leader here, but we definitely would love to be sure that the rough edges in this workflow are addressed! https://environments.rstudio.com may be a resource worth exploring if you have not yet!

@slopp's work there has been invaluable to me!