What are the main limits to R in a production environment?


#21

@cole - hmm, can you simply update 1 package without updating the rest? E.g.

Package: BH
Source: CRAN
Version: 1.62.0-1
Hash: 14dfb3e8ffe20996118306ff4de1fab2

simply change to

Package: BH
Source: CRAN
Version: 1.55.0-1
Hash: 14dfb3e8ffe20996118306ff4de1fab2

?
The simplest would be just to rewrite packrat.lock (as I did in the lines above), however this doesnt work. And I’m not aware of any other way how to define a particular version of package I want to install (up/downgrade).

I completely agree that 3rd party dependencies need to be solved by users (cannot e done by packrat)


#22

If you want to fix system dependencies to versions, I think (Docker) containers are the way to go. You can install specific versions with the system package manager as needed.

@mishabalyasin Regarding “re-install every single time”: Imho this can be largely mitigated by (a) using the Rocker project’s images as base images (if you need dplyr, rocker/verse, you also ger all the MRAN advantages “for free” with version tagged images) and (b) letting Docker Hub (or GitLab, or your own build server) build the images for you.

I must admit that these two pieces of advice don’t go well together.


#23

@xhudik that is a fantastic question. Sure thing! This is the sort of thing that takes some getting used to and could perhaps be improved. Steps to reproduce:

install.packages('BH')
packrat::snapshot()
PackratFormat: 1.4
PackratVersion: 0.4.8.1
RVersion: 3.4.2
Repos: CRAN=https://cran.rstudio.com/

Package: BH
Source: CRAN
Version: 1.65.0-1
Hash: 95f62be4d6916aae14a310a8b56a6475

Package: packrat
Source: CRAN
Version: 0.4.8-1
Hash: 6ad605ba7b4b476d84be6632393f5765

Now, if I want to force a package version, then I can edit the lock file as you mention. I usually delete the Hash: entry entirely, since this maps back to the version that I currently have installed. It seems to work fine without doing so though. Specifically:

PackratFormat: 1.4
PackratVersion: 0.4.8.1
RVersion: 3.4.2
Repos: CRAN=https://cran.rstudio.com/

Package: BH
Source: CRAN
Version: 1.55.0-1
Hash: 95f62be4d6916aae14a310a8b56a6475

Package: packrat
Source: CRAN
Version: 0.4.8-1
Hash: 6ad605ba7b4b476d84be6632393f5765

Then, call packrat::restore() to restore your state to that represented by your lockfile. You will get a nice little confirmation warning (this is where being able to build from source is important):

Note that packrat's internals get in the way here, because it does another snapshot before restoring state, so I get 1.65.0-1 in my lockfile, even though 1.55.0-1 is installed. This might be a bug / feature request (and might be paired well with a set_lock_version function or something to make this process easier.

PackratFormat: 1.4
PackratVersion: 0.4.8.1
RVersion: 3.4.2
Repos: CRAN=https://cran.rstudio.com/

Package: BH
Source: CRAN
Version: 1.65.0-1
Hash: 95f62be4d6916aae14a310a8b56a6475

Package: packrat
Source: CRAN
Version: 0.4.8-1
Hash: 6ad605ba7b4b476d84be6632393f5765

The way to remedy that is with another packrat::snapshot(), but here we run into a note.

I typically appreciate the verbosity, but I politely tell packrat that I know what I’m doing with packrat::snapshot(ignore.stale=TRUE).

Now my lockfile is in the state that I expect, with a new Hash and packages in the state that I want:

PackratFormat: 1.4
PackratVersion: 0.4.8.1
RVersion: 3.4.2
Repos: CRAN=https://cran.rstudio.com/

Package: BH
Source: CRAN
Version: 1.55.0-1
Hash: d924d63d19a9615bdcb2548b534550f6

Package: packrat
Source: CRAN
Version: 0.4.8-1
Hash: 6ad605ba7b4b476d84be6632393f5765

Some noted points:

  • Per @Tazinho 's Christmas wish - it definitely is not a “just works” or “running smooth out of the box” kind of solution, but it has all the power and flexibility I want (especially with regards to installing specific commits from a git repo, archived source versions of local packages, etc.)
  • The reason for the “ignore.stale” requirement is that packrat does not know whether I want to keep 1.55.0-1 installed or whether the 1.65.0-1 in my lockfile is what I really want. Because of the version conflict, it checks to be sure I know what I am doing before overwriting the lockfile version. The warning might prompt me to say “Oh, I forgot to packrat::restore()! Woops!”
  • One of the big pain points that happens with packrat is when the R session terminates in the middle of an install and packages are left in a weird state. I’m not sure if this is crazy or not, but my response has typically been to just rm -r the folder in question and trust packrat to rebuild my dependencies from scratch.

Hope that helps! Packrat has been a life-saver for me, and I do lots of the version-munging that you mention. It would certainly be possible to have packrat::snapshot() be an automated part of the development submission process and packrat::restore() be an automated part of the release process. I have been bitten by that in the past - forget to do one or another and then things break during the release: “??? I tested this! Oh! Snap. I forgot to restore my dependencies on the new system.”


#24

@nuest As I mentioned here Internal CRAN-like system - best practices inability of packrat (or checkpoint) to deal with system dependencies is the main reason we use Nix package manager.

A recent example: a user requested sf - a package for spatial data which depends on a standard library from GDAL project. It turned out that our version of Linux did not have packages for new-ish versions of GDAL. So the choice was between building GDAL from sources (and then maintaining the install to keep it compatible with future upgrades of both R and R packages) and just running one Nix command:

nix-shell -p R rPackages.sf --run R

which takes care of bulding and caching correct versions of all c++ libraries and corresponding R packages. That command is guaranteed to keep working for all future upgrades of all the moving parts. If I need to throw in some python packages to the environment (tensorflow?), it would be a matter of adding them to the command line and Nix will make sure that all versions of everything will be compatible with each other. Another bonus - if I so desire, the “sandbox” will be invisible to anybody else (including my other projects which may require different versions of the tools). The command line can be replaced by a short script in the Nix language which I can then run to enter the sandbox.

Regarding Docker vs. Nix, I really like this post https://blog.wearewizards.io/why-docker-is-not-the-answer-to-reproducible-research-and-why-nix-may-be.

The reproducibility issue is solved automaticall by the fact that the “recipes” for all Nix packages consititute a single entity - the Nix Packages collection and it is trivial to pin a particular commit of Nixpkgs in either that command line or the corresponding nix script.


Using R and conda
#25

I think R being single-threaded is definitely an issue, surmountable, but something that makes R in production for batch work not a problem but R hanging around as a service less great.

I wish I could make sense of packrat but Rocker + MRAN has been a huge help for dependency lockdown.

But honestly I think most of the problem is R developers not coming from a culture of software development and therefore having neither the tools or practices expected to do the work of moving R to production.


#26

packrat is generally really great but unfortunately it has a lot of bugs and situations where it doesn’t work and as such breaks many automatic deployment scripts (besides beeing extremely slow for bigger dependencies).
On the other hand development is still active, the maintainer very helpful and many bugs are fixed in development version. This is not helping for production though as there was no stable release since 2016 anymore.
So in the end I have quite some mixed feelings. Giving its some kind of “RStudio” supported package, a clearer roadmap and/or strategy would be very helpful.


#27

Just to mention, here is also a nice writeup on checkpoint, packrat and docker.


The conclusion is that packrat and docker are the best option.
In my personal opinion there are still issues about docker and especially on Windows docker is not without headache (and I would not recommend it). However, in the comments of the blogpost also an interesting discussion is started on further drawbacks.


#28

8 posts were split to a new topic: Questions about R in production


#30

#31

This topic was closed. If you have questions related to it, we encourage you to start a new thread.