Packrat: Use multiple lock files to speed up installation in docker?

I am using packrat::restore with a packrat.lock file to install the package dependencies of my project in a docker container. Whenever I make a change to the packrat.lock file the packrat::restore step is reexecuted, instead of taking the cache and all packages are reinstalled, which takes a long time.

I think it would be great to have the opportunity to separate the packrat.lock file into multiple smaller ones, for example to separate the packages which are essential for my project (e.g. shiny, stringr, data.table...) and packages I might remove again because I am just testing out a new feature. The first packrat.lock file would change not very often, so I can use the docker cache here, which would significantly speed up by build times.

Interested, what you think about this.

Hi Markus,

Interesting question!

For starters, be sure you are placing your packrat::restore command at or near the end of the Dockerfile. This will prevent changes in your packrat.lock file from busting the remainder of your Docker cache.

I'm not totally sure I understand the remainder of your workflow. For instance, why not install the experimental packages into a running container, and only snapshot if you are happy with the results? The lockfile is designed to capture requirements for the final environment, not dictate the environment ahead of time. Alternatively, you could do your experimental work using a different lockfile, and only merge the results after the fact (similar to branching and merging your source code in Git).

Unfortunately, it would be very difficult to support a notion of separable lockfiles - the reason is that R packages exist in complex dependency networks. To create a separable lockfile you'd need separable dependency graphs which is unlikely. For example, if your "core" environment has shiny, and your new environment wants to add plumber, those would not be separable because they both depend on httpuv.

1 Like

Hi Sean,

thanks for your thoughts!

I slightly disagree with your first point because everything you do with the R code must come after the package installation. And if I just fix e.g. one typo or small bug in a script I do not want to have the packrat::restore reexecute which takes 95% of the building time. Or did I misunderstood you here?

I think the question is, what does final environment mean? If it is software project which continously evolves there could be releases every week or I am testing every commit with some CI solution and then it would be nice if I can nearly immediately preview my shiny app on a testing server etc. Now I have to wait 45-60 minutes until the dockerfile is built. My solution is currently to change the lockfile as few times as possible.

Hmm, I think separate lock files could be possible without too much effort without looking too deeply into the code (but I could also be completely wrong :sweat_smile:). The packrat::restore function just would need to check if a package is already installed and install it only if necessary.

In your example the first packrat lockfile would contain shiny and httpuv, so a first packrat::restore call would install those packages. Then the second time restoreis called with the second lockfile, which contains httpuv and plumber, only plumber has to be installed. Of course it would not be possible to have different httpuv versions.

Interesting - so is this your workflow?

  1. Update R code (optionally updating packrat.lock file)
  2. Commit to Git
  3. Post-commit, a Dockerfile is built that calls packrat::restore and runs the shiny app in open source shiny server

How frequent are the commits? I imagine you'd mostly be "testing" things in the app locally prior to committing?

One thing you could do is mount a packrat cache into the Dockerfile. This is essentially how RStudio Connect works, and allows for almost instant re-deploys.

The real solution here would be to have Linux binaries available for packages in a repository - this would be the equivalent of a "cache", but not mounted in through a directory. They would allow for instantaneous installs. Linux binaries are something we're working on right now(!!). What base image do you use?

2 Likes

To avoid this in docker I first copy packrat/packrat.lock to create the environment for code building the required packages. Then in a later line in DOCKERFILE, copy the rest of the code inside the container and execute what needs to be executed. If I had just some small change only to the code with no change in dependency, all the packrat workflow is skipped because lockfile did not change. It helps iterate on docker image development.

I can confirm that packat cache is another way of improving deployment speed, and pretty easy to setup too.

Yes, exactly that is my workflow. Most testing is done locally and only for specific commits (i.e. tags) the docker build is triggered right now. Currently it takes about 30 minutes to build the Docker image, so it still would be nice if this could be made faster.

The base image is the rocker/shiny:3.5.2 image.