RStudio Server Pro cluster: Separating traffic based on group

Our company uses a single RStudio Server pro cluster in load balancing setup to serve all our R users in our division. We are seeing interest from other parts of the company in using R, and I'm working on planning the expansion of our cluster to those other groups. We have several different divisions in our company, and our R users will each be associated with one of those divisions. We would like to have a single heterogeneous cluster where we have a mix of nodes "owned" by the different divisions (so server costs can be allocated accordingly, and we're isolating data and risk of downtime caused by heavy usage by users from one division).

We can certainly load balance users to specific nodes based on user groups and have that working in our tests so far. My questions are as follows:

  1. Has anyone had experience with this heterogeneous setup in a single cluster before? Or do you simply create multiple clusters, one for each division? There are obviously pros and cons to each approach.
  2. For the users who actually end up doing work across multiple divisions (small but growing set of users), is it possible to make R sessions in the workspace manager "sticky" so that when you launch a suspended session it will spin up on the proper set of nodes? Or is there a way to isolate the ~/.rstudio/sessions folder which stores session history and info, so that a user only sees projects associated with a particular division?

Any other general thoughts or advice you may have as we make this step from supporting one set of R users to supporting many would be appreciated.

Tagging colleagues to make sure they take notice: @thomas, @Tanner, @navameen, @Vineesh, @Siddharth

2 Likes

Thanks for the detailed explanation of your use case. More interest in R is a great problem to have!

There are two strategies to consider for load balancing RSP users on specific nodes or groups of nodes.

1 - Custom Load Balancing in RSP:

There are different load balancing strategies that can be used with multiple nodes in RSP, and one of them is to use a custom balancing method that calls out to a script and passes details on the username and list of nodes. This could be used to balance users on specific nodes or a group of nodes but requires you to specify and maintain the logic for this to happen:

https://docs.rstudio.com/ide/server-pro/load-balancing.html#custom

2 - RSP Launcher:

Last week, we announced functionality in RStudio 1.2 that includes a Job Launcher that can start R sessions on external systems such as batch schedulers and container orchestration platforms. Launcher will initially ship with plugins for Kubernetes and SLURM, and additional plugins can be developed for other systems. In this case, you could handle the load balancing and access control by nodes in systems that handle this in a more scalable way:

https://docs.rstudio.com/ide/server-pro/1.2.1244-1/job-launcher.html

You might consider implementing approach #1 in the short term and planning to implement approach #2. Or, you could get started with approach #2 with the RSP 1.2 preview and see how it works for you:

https://www.rstudio.com/products/rstudio/download/preview/

Let us know how we can help along the way.

3 Likes

Thanks @koverholt, I really like your suggestion for the job launcher. That seems like a promising way to tackle this problem, and we'll definitely look into that approach! We may get started with approach #1 above for now.

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.