packrat speed issues


#1

Hi all,
We're implementing R as the datascience / ML infrastructure in our large company (1500+), using Rstudio Server, Rstudio Connect and RSPM, combined with git/TFS as the primary application lifecycle platform. For reproducability, collaboration and versioning, we're using git/packrat/r-packages for all products. However, packrat quickly gets very slow on snapshots and inits, especially for projects with many dependencies and local packages.

Is there any work ongoing on bringing this project up to speed?

Thanks again,
Johannes W


#2

I forget where many of the speed slow-downs are in packrat... I know that generally there is a desire to improve packrat in many of the areas that it struggles, although I am unsure of what the timeline / prioritization of such a task looks like. I know that many of the supporting tools that would enable such an improvement seem to be being built in r-lib.

As for improving speed in the short term. You might try packrat::.snapshotImpl(".", snapshot.sources = FALSE) for generating a packrat.lock file "quickly" and seeing what the performance looks like there. Unfortunately, init is probably the slow piece (pulling all of the sources, installing packages, etc.) and also the piece that you care about using to encapsulate the actual environment you are using. Are you familiar with the packrat "global cache"? That is another piece that can be very useful for optimizing performance / reducing the number of redundant re-installs.