Using R and conda

anaconda

#1

I am using R on a shared server. Not having root permissions makes some things difficult. I recently started using conda, which has been working well for specific tools. Everything python-based worked for me, which makes sense considering it was developed by python users. I recently needed to use an R package that had python-based dependencies. It seemed like a great use-case for conda. Unfortunately, I keep running into problems with R.

  • The most obvious is that R 3.5 is still not available (after 3 months) on either conda-forge or r channels. This is not a problem for most packages, but the latest version of Bioconductor needs the latest version of R.
  • When R is available, it can be either the standard version or the Microsoft R. MRO is fine, but it has its own caveats. It also splits the entire community, since different conda packages depend on different R flavors.
  • My original plan was to use r-base and just install all the packages normally (via install.packages() or biocLite(). Unfortunately, I quickly ran into problems with packages like rJava and rlang which had external dependencies. I couldn't even install some mainstream packages like tidyverse. You can install individual R packages as conda packages, but if you have several, you quickly run into dependency problems. After a few, it wanted me to downgrade to R 3.3. I should admit that the biggest problem for me was probably hdf5r, which is more obscure, so fewer combinations are possible. What's frustrating is that depending on the combinations, the results can be very different.

I looked around to see if there are any tutorials or suggestions. All the guides show how to install one package, which not surprisingly works fine. As soon as you try to make any customizations, errors start appearing. Am I doing it wrong or is it really this hard? Is there some community effort to improve the situation? For example, it would not be completely unreasonable if RStudio had created a "base R" package that included a clean copy of R with all the common dependencies.


#2

From what I hear, this is a pretty common story whenever someone tries to use conda R. Theoretically, I'm sure it's possible to get it to work right, but I suspect it'd take a non-negligible amount of work that nobody who knows the requisite turf appears inclined to do.

If you can get it to do what you want without going crazy, great. Otherwise, it's worth asking why you want to use conda. If you want a programmatic way to install and update, there are more general package managers that can install CRAN R (built-in in Linux, Homebrew Cask on Mac, maybe choco on Windows). If you want package checkpointing to avoid conflicts (not that I've seen...any, really) using MRAN or pacman or similar may help. For self-contained environments, the rocker docker images are handy. If you want to enable R kernels in Jupyter, you don't actually need conda R for that.

Regardless of your installation, installing some things—e.g. RJava—will likely still be a pain. (It does enable some cool packages, though.)


#3

To clarify, I would like to have a custom python and R environment on a server without root access. The ideal answer to the first part is probably Docker, but the second part makes that impossible. Conda seems to be a perfect alternative in theory, but as I highlighted above, it's easy to run into dead-ends.

Since I initially posted this, I started experimenting with more complete Conda packages. For example, those for packages with extensive dependencies. Ironically, some of them install, but then the only package they are designed for does not actually load.


#4

As I mentioned elsewhere, Nix is a great way to install and maintain software with dependencies. The drawback is you (briefly) need root access during the initial install to create /nix directory. After that you don't need root access ever. The upside is everything you install with nix is compatible so, for instance if you do

nix-shell -p R rPackages.dplyr rPackages.rJava python pythonPackages.numpy --run R

you will get a sandbox environment with R, dplyr, rJava (with correct version of JRE), python, and numpy where everything works regardless of whether your host OS have correct java, shared libraries, etc. or not.

We do have anaconda for purely python projects because the learning curve is lower but it is often painful to use for the reasons you mentioned so we are gradually moving towards using nix for everything.