I trying to set-up RStudio Server on our HPC before we maybe buy the Workbench version. I can successfully connect via my browser to RServer on the master node, but when I try to do the same on a computing node I can't establish the connection. Has somebody experience with this?
Can you please be a bit more elaborate on what problems you are facing when running RStudio Server on the compute nodes ? How did you set up Rstudio server on the compute nodes? Maybe it is a network firewall issue ? Or is it the integration into the scheduling system ?
More than happy to follow-up over 1:1 messages and then inform the wider community of the outcomes of our conversation.
The problem boils down to me having no clue if I even need to install RStudio Server on the compute nodes. I installed RStudio Server on the master node via the RPM package for CentOS 8, and I have the rserver.conf in /etc/rstudio configured that it loads a specific R version. I use SLURM as a scheduling system. If I want to run a job on computing node1 for instance, I start an interactive session and then would load the R version via environemt modules (module load). On the documentation I have seen that I need to install the RStudio Workbench session components on each computing nodes, I will try this tomorrow and report back.
I clearly can see what you are trying to achieve. There is however parts that are not possible with the free version of RStudio Server (cf. RStudio Server Open Source vs. RStudio Workbench Comparison - RStudio)
For your HPC setup it is mostly the Launcher component that is missing in the free version (which would allow you to use the session components installed on each node. you then could simply log into the main RStudio server and launch sessions on the compute nodes. Those sessions would then run as normal SLURM jobs). So, installing the session components at the moment may not be very useful.
I see like three options for you to utilise the HPC cluster
- Leave your current setup as-is. Use RStudio Server on the Master/Head Node and then utilise appropriate R packages to remote submit jobs to the HPC cluster (e.g
slurmR) For higher level abstraction you then could look into the
futurepackage with the appropriate backends. For some use cases where a single user wants to use a fair amount of compute resources (e.g. for bootstraps, parameter studies or embarrassingly parallel stuff in general) this could go a long way.
- Build Singularity integration into your SLURM cluster and use RStudio Server baked into a singularity image. You then could provide wrapper scripts that would submit a job to the SLURM cluster which then would run the singularity image and start Rstudio server on the allocated compute node. Barring any network firewall problems, the user then could connect to the Rstudio server instance on this compute,
- Request a demo and/or a free trial license for the commercial product so you can explore the Launcher Functionality and implement an architecture as shown here