RStudio Server system sizing & capacity planning

Hi,
I was able to find this URL for RStudio Server system sizing

However, I am wondering if there is any system sizing/capacity planning tool or guide available that would help enable RStudio server at enterprise level.

I have worked with some other unrelated products in the past where vendors had put together a system sizing tool and once you enter some pre-requisite information it would give a general idea as to how much RAM, CPU, storage and possibly number of servers you may need for load balancing purpose.

1 Like

Hello!

This is a fantastic question. Unfortunately, the answer is usually pretty dependent on the type of work that R users on your server will be doing. It is important to keep in mind that:

  • R is memory intensive and often copies on reference (i.e. a 10GB dataset could be expected to use 20GB+ of RAM)
  • R is generally single threaded (although there are packages that expand this functionality)

We do have an app available that tries to provide a general rule of thumb here: https://gallery.shinyapps.io/instanceCalc/

If you would like to discuss the item with someone, our Customer Success or Solutions Engineering teams would be happy to do so. We usually recommend picking a general size based on the best available estimate and then:

I hope that helps! Any remaining questions that we can assist with? Since you mention load balancing, I want to be clear that load balancing of RStudio Server Pro is usually an active-passive setup. It is possible to enable fail-over if the master node goes down, but traffic should only be routed to one node at a time. RStudio Server Pro will take care of routing R sessions to the most desirable node using your chosen balancing method.

2 Likes

Hi Cole,
Thank you for your response. Would it be possible for you to elaborate on the following sentence?
What kind of packages are available to make R program multi-threaded? Can any R program be written to leverage multi-threading?

Is there any way of failing over R session itself? For e.g. if a server crashes where R is executing can the session that was being executed on the server that crashed be gracefully ported to another R server?
Thanks in advance for your response.

The CRAN task view on High-Performance and Parallel Computing with R describes many ways to make use of explicit and implicit parallel programming.

If you use an alternative BLAS (binary linear algebra subsystem), e.g. Intel MKL, Atlas or OpenBLAS, then your matrix algebra calculations will be multi-threaded. For an overview, see Colin Gillespie's book on Efficient R programming, specifically chapter 3.5 BLAS and alternative R interpreters.

2 Likes

Along with what @andrie said, yes, any R program could theoretically be written to leverage multi-threading. It is just a question of packages, code, and the skill of the developer.

I am unaware of any way that the R session itself could be saved from a crashing server. In flight computations are in RAM, so I do not think there is any real way to save them. You could always dispatch jobs to Spark or Hadoop where there is some level of built-in redundancy to deal with node failures.