docker buildkit cache package build and install?

I'm trying to use the docker buildkit approach to caching packages to speed up adding packages to docker containers. For Python and apt-get I am able to get this to work, but I can't get it to work for R packages. In a Dockerfile for Python I'm able to change:

RUN pip install -r requirements.txt

to (and the comment looking bit at the top of the Dockerfile is needed)

# syntax=docker/dockerfile:experimental
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt

And then when I add a package to the requirements.txt file, rather than re-downloading and building the packages, pip is able to re-use all the work it has done. So buildkit cache mounts add a level of caching beyond the image layers of docker. It's a massive timesaver. Check out the instructions for both python and apt-get packages and useful answer on caching python packages. I'm hoping to set up something similar for r-packages.

Here is what I've tried that works for apt-get but not r-packges. I've also tried with the install2.r script.

# syntax=docker/dockerfile:experimental
FROM rocker/tidyverse
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
  apt update && apt install -y gcc \
      zsh \
      vim

COPY ./requirements.R .
RUN --mount=type=cache,target=/usr/local/lib/R/site-library Rscript ./requirements.R

Anyone have this working?

3 Likes

Do you want to specifically use buildkit ?

Personaly I use renv with Docker by taking advantage of the caching mechanism from renv and mouting the cache directory in a docker volume for persistence.

https://rstudio.github.io/renv/articles/docker.html#running-docker-containers-with-renv

I never tried to use buildkit so I don't know if this can be combined or not. but I guess it can. what do you think ?

Just sharing in case it could be useful.

2 Likes

Thanks. Yes, renv seems the easiest go, with the renv library retained on the docker build-kit docker machine, and the specifically needed library moved to the specific container.

FWIW, I don't love the idea of mounting the renv cache, since then the containers don't actually have the libraries on them, if you move them to a different host machine. Although for a purely local laptop implementation this would work.

In chatting with a few people seems that the reason that buildkit caching approach doesn't work on R out of the box is that the R packaging system does the install in place (? not entirely sure), and doesn't retain a copy of the downloaded code or build artifacts.

I kept playing with this and I have it working in a simple case, perhaps of interest.

The key was in the renv::isolate call which separates the project library from the build time shared cachemount, leaving the shared cachemount available for building other images in the future.

1 Like