Where do you install packages from and to if you're using R in Docker?

I’ve found a lot of users conflate Docker with reproducibility. Docker is a unique tool because it isn’t really an environment (like a VM), nor is it just a process. It shares characteristics of both.

Regarding R, we’ve seen everything from users mounting in their R installation and package to including lines in their Dockerfile that look like:

RUN R -e "install.packages(...)"

The latter approach is, unfortunately, not reproducible in the sense that re-building the image is prone to creating a different environment. RStudio Package Manager provides an alternative because it versions every repository:

RUN -e "install.packages(..., repo = 'url-to-repo-version')"

OR, you could refer to a repository with a curated source, frozen automatically:

RUN -e "install.packages(..., repo = 'url-to-curated-repo')"

Options outside of RStudio Package Manager might be to use packrat or devtools, e.g.

RUN -e "install.packages('devtools'); devtools::install_version('ggplot2', version = 'x.y.z')"

OR

COPY /packrat /project/packrat

RUN -e "install.packages('packrat'); packrat::restore()"

What do you do?

1 Like

Currently, we use packrat mechanism to achieve that. I seems from some tests that devtools::install_version install also dependencies but the last one available from the cran. So packrat is better because it contains every version for every package.
We use packrat::.snapshotImpl(".", snapshot.sources = FALSE) to generate the lock file, then modify this file is needed (change of repo url for exemple) and use it to restore. We use cache mechanism sometimes, but not with Docker.

We are also looking toward new solution like

The most annoying thing is the system dependencies that we need to manage in another way. And it is not easy and very automatic right now. :frowning:

2 Likes