Rules of thumb for scaling RSP clusters

RStudio Server Pro customers often ask how large should my server nodes be? I typically respond with configuration and sizing recommendations on our support site. But how do you scale out a cluster of RStudio Server Pro nodes?

Here are some general rules of thumbs from personal experience. These are not RStudio recommendations, just things I have personal experience with. Your individual experience may vary greatly.

  • R is single threaded by default, so every active user with heavy compute needs will need their own core. That means 16 concurrent users will need 16 cores (or threads).
  • For every large session multiply the size of your data by 3.5 to get total overhead cost. So a thread with 10 GB of "raw" data will need roughly 35 Gb of available memory to function well.
  • For really heavy workloads, consider more smaller nodes instead of fewer larger nodes in your cluster. Generally speaking, power users will fare better when there are fewer users on a node, especially if they are spawning jobs or running heavy compute with the system BLAS libraries.
  • Turn on the admin dashboard, and open the dashboard to the power users so they can police themselves. Consider piping the output from each node to a carbon based monitoring tool such as Graphite.

If you have experience scaling out RStudio Server Pro clusters, I would love to hear what rules of thumb worked for you!

Nathan

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.