Running RStudio Package Manager on HPC

It's fantastic that you offer x86_64 binaries for R packages on CentOS. However, one of the chief reasons I use R is to multiply big matrices. With my beloved Emacs and the open source version of RStudio plus the Shiny Server I managed to get everything setup in an easy way so that my colleagues can get their feet wet with R without having to mess with configuring or installing stuff at all. Everything just works and is ludicrously fast because I got my friendly HPC-obsessed system administrators to link R against openBLAS and I went through the trouble to figure out how to setup site libraries for R packages where everything is already compiled once and installed by default. Everyone loves it. I've been pretty successful getting people on board with coding up R things with the system. However, at this point only me and a few others at my company understand how to deploy apps for other users to use.

Now I've been pushing very hard for get us moved over to RStudio Teams and RStudio Package Manager. When it works, it really makes deployment a breeze. However, the current approach in Teams which makes this possible is pushing everyone towards following a workflow closer to the python-centric one where everyone has to spend a large amount of time up front (before new people have necessarily bought in to R) getting their environment setup in the first place so they can do easy deployment if they want. That works okay if all the Binary packages they are using are readily available. However, this leaves HPC users out in the cold because they would prefer that packages be precompiled with MKL or openBLAS. We would love to compile these ahead of time ourselves, but can this use case be documented in the RStudio Package Manager Admin guide? Alternatively, could RStudio just build the openBLAS or MKL compiled versions of packages as well?

We have an internal R training class that we put together and gave using our Open Source system and after giving that class in May and our advertising for the easy deployment of Shiny apps using Teams, one of the leaders of another team got excited and wanted us to give that class again, this time with the RStudio Teams system. Unfortunately, we decided to postpone the training because right now getting the Tidyverse configured and up and running takes over an hour when you have to compile everything from source. This makes R seem like something for specialized computational scientists, and not something every researcher should be using instead of Excel. That is obviously not the impression we want to give. So, I'm begging you to think about adding documentation to the Admin Guide for setting up Binary package management with custom compiled R packages and also consider adding the most common custom configurations (e.g. x86_64 with openBLAS and MKL) to your set of package offerings.

Thanks for letting me get that off my chest :smiley:.

1 Like

Hi,

RSPM builds binary packages against the shared BLAS*, which can be swapped out with a compatible BLAS implementation like OpenBLAS, ATLAS, or MKL. If you install R from EPEL or from RStudio's precompiled R binaries (https://docs.rstudio.com/resources/install-r/), you'll get a shared BLAS setup with OpenBLAS swapped in already.

If you want to self-compile R and use RSPM binaries, you'll have to compile R without the --with-blas and --with-lapack configure flags to use shared BLAS, and then replace the shared BLAS with a symlink to your alternative BLAS library, as described in https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Shared-BLAS.

These details aren't documented in the Admin Guide, but we'll look into fixing that.

*edit: RSPM actually builds CentOS binary packages against OpenBLAS right now, but that's a bug that should be fixed soon.

Thanks, that's very helpful info!

As a follow-up, I asked our System Administrators to install R with MKL using the shared BLAS setup. Unfortunately, it is not working.

Running the code below crashes R and prints a MKL linking error:

system.time({ x <- replicate(5e3, rnorm(5e3)); tcrossprod(x) })

In RStudio Pro, running this code causes the whole IDE to hang, with no error printed.

However, as I mentioned opening the Terminal pane and starting up R there, we do see the error:

INTEL MKL ERROR: /opt/R/R-4.0.2-mkl/lib64/R/lib/libmkl_core.so: cannot open shared object file: No such file or directory.
Intel MKL FATAL ERROR: Cannot load libmkl_core.so.

Do you have any tips or suggestions for setting up the shared BLAS?

I think you need to add some other MKL libraries to the shared library search path. One way to do this would be to use the ld.so cache, as described here: https://github.com/eddelbuettel/mkl4deb#integrate-mkl

echo "/opt/intel/lib/intel64"     >  /etc/ld.so.conf.d/mkl.conf
echo "/opt/intel/mkl/lib/intel64" >> /etc/ld.so.conf.d/mkl.conf
ldconfig

Another way would be to set LD_LIBRARY_PATH for the R process, like:

export LD_LIBRARY_PATH="/opt/intel/lib/intel64:/opt/intel/mkl/lib/intel64:$LD_LIBRARY_PATH"

The R documentation recommends the first way, if it helps: https://cran.r-project.org/doc/manuals/r-devel/R-admin.html#External-software