Using linux Lmod with R and RStudio

Hello R-Admin community,

I am posting here seeking some experience about what it seems to be a pretty advance linux stuff for R community. I did not found much info on this for R users and admin.

In our server environment, we use a tool called Environment Module so that a user can modify its current bash environment to use some specific scientific tools. We use specificaly Lmod implementation (https://lmod.readthedocs.io/en/latest/) . We use module for creating environment for tools like Latex, Python, AMPL, XPRESS, or even base R with special .libPaths().

Integration with module is now supported in RStudio Server Pro, to load some module with specific R version. See https://docs.rstudio.com/ide/server-pro/r-versions.html#extended-r-version-definitions
This is useful, but this means to have a predefined set of R versions, using some modules, and that the Lmod tool is available to R ecosystem through the use of RStudio Product.

I am still trying to define what are the possible stategy regarding Modules and R integration.

I am interested to know if anyone have experience with any of this.

Can we use Lmod modules, from R to modify the R Sessions ?
From my tests, it does not seem possible. Calling system("module load my_module") from inside a R process, does not actualized the environment of R. Not sure it is even possible.

one way currently to achieve this is to load module in ~/.bash_rc to modify env just before R is launched. Even in Rprofile it seems too late.

Does anyone know a tool to help deals with modules in R ? Is this even possible ?
before digging in some tricky way, or creating a new tool, I would like to know what anyone thinks.

Regarding RStudio Server Pro Support, a new interesting feature would be to be able for a user as project options to configure which modules the user wants to load for a project. I don't know where is the best place to drop such Feature Request idea.

Thanks all - anyone which have any clue or experience regarding Modules (like Lmod) and R are more than welcome to chime in !

You are able to do this now.

if you add /etc/rstudio/rsession-run, you can add env varriables and scripts for EACH rstudio session:
#!/bin/bash
LD_LIBRARY_PATH=$R_HOME/lib:$LD_LIBRARY_PATH
PATH=$R_HOME/bin:$PATH
MANPATH=$R_HOME/share:MANPATH TMPDIR=/local1/tmp export MANPATH PATH LD_LIBRARY_PATH TMPDIR source /home/biotools/tex/current/PKG_PROFILE exec @

Alternatively, you can use the new r-versions and have MULTIPLE
profiles and environments.

Path: /home/biotools/r/R-3.6.1
Label: Current Production Version
Script: /etc/rstudio/rsession-run.sh

It is possible. However, " Calling system("module load my_module") from inside a R process" as you do simply sets the variables in the subprocess spawned by the system call, which goes away when that subprocess ends and returns to R.

You want to change the environment of the process R is running in.

Sys.setenv(myvar="myval") can be used to set a single environment variable. A step in the direction, but not enough.

readRenviron lets you "Set Environment Variables from a File"

So, If you had an process to build such a file "on the fly" from your module, you could save its output as a temp file, load the temp file, and delete it.

Here is a process that works from the command-line for the module named "cuda"

mymod=cuda sh -c 'eval modulecmd bash load ${mymod} ; env'

So, build strings like that in R, and pass them to system (piping their output to a temp file).

Or, probably better, use Rs system2 command to pass the value of mymod in the environment and specify where stdout should go. And tempfile to get a safe place to put that stdout.

Then load that temp file with readRenviron ... and don't forget to delete it.

But, why do you really want to do this? And do you really want to do it in the middle of an ongoing R session? I suppose if you were in the middle of a long running R session with lots of state and you want to use one of your environment module programs and not lose the state of your session would be a good idea.

And, if you're wanting to do all this under RStudio Server and are not the admin of the server, then you need to get some cookies for the admin to contrive for environment modules to even be available. The cookies might have to be special if you're using the open source or "Prof" version of RStudio Server.

Might this work on Windows? No idea.

Let us know if you put the pieces together.

I don't see any other way of doing it.

1 Like

Thanks for all this useful informations.

I am the admin of our clusters and we use linux Lmod modules to help users deals with different environments (Python, R, optimisations tool, other tool, ...). This is use heavily in our slurm clusters too.

For our RStudio Servers Pro cluster, R users can't easily use module to load the correct env values, for example to use Optimisation tools (like AMPL, or XPRESS). It would be insteresting to be able to do so, and by projects.
Currently, we can load modules through RStudio Pro support but it is by R versions not by project.
Or a user has to modify its .bashrc to load the wanted modules - .bashrc it run before R is launched so it works. But it is not by project. Every R session will have the modules.

A tool to get the result of module load then load them into R is really interesting. I'll will look into that. Thank you !

Hi,

we have a (custom) integration between Rstudio Pro and environment modules (Lmod 6, Lmod 7 broke it).

We provide metamodules based on Bioconductor releases (our users prefer to base the releases on Bioconductor and not in R itself). Mainly each release has its own libPath.
That Bioconducto module is what we present to our end users in Rstudio version selection.
So, if someone wants to load Bioconductor version 7, he/she has to select Biocondcutor/3.7-R3.4.3 from the list of available R versions.

How? We have some wrappers in the system:

  1. custom /usr/bin/R that preloads the default R version based on Biocondcutor (latest). This way we make sure that default Rstudio session will always load latest Bioconductor module.
#!/bin/bash
[...]
. /etc/profile #(this makes the module function available)
ml load Bioconductor/10
echo "done laoding Bioconductor/10"|logger -t main-R
exec R "${@}"
  1. In r-versions file we add a direcgtory for each R (Bioconductor) version we want to make available. something like:
versions/Bioconductor-3.8/
versions/Bioconductor-3.9/
versions/Bioconductor-3.10/

and, in each directory, we create bin/R for each of the above and this R is, also, a wrapper that preloads the Biocondcutor module the user has selecte.
S, i.e, for 3.8 it would be:

./versions/Bioconductor-3.8/bin/R

#!/bin/bash
[...]
. /etc/profile
ml load module
echo "done laoding Bioconductor/8"|logger -t custom-R
exec R "${@}"

This has been working nicely for the last 2 years. As I mentioned before, Lmod 7 broke it becasue they introduced R.in and the definition of the function is, for some reason, wrong when beiong called from Rstudio.

Also, the integration between modules and Rstudio that you mention does not work as one would expect. You could think that you could put each module you want to make available as a version, something like:

Path: /MyPath/.../version8
Label: My version 8
Module: module/8

Path: /MyPath/.../version9
Label: My version 9
Module: module/9

and then, when a users selects one or the other version, rstudio would load the module 8/9 before starting the new session. But this is not how this thing works (ask support for further details).

HTH,
Arnau

1 Like

Wow ! Thanks a lot, that is an amazing shared experience. It will help a lot !

This is exactly what I thought.

I'll see what supports as to say. Or if someone from Rstudio teams add something.

Thanks again.

Hi,

sorry but I forgot to mention that it's mandatory to runa different R version in each module. Otherwise it does not work. In our case each Biocondcutor release runs under a different R version.

Best,
Arnau