Parameters to consider for sizing RStudio Server Pro server

Hello,
Once users are onboarded on RStudio Server Pro, we can see the trend and fine-tune server resources accordingly after the fact but not having any trend before hand, does anyone have any recommendations on what all criteria one should consider, what kind of questions we should ask end users? If someone has such questionnaire and can share here, it would be of great help.
Thanks in advance.

2 Likes

This is a fantastic question. I'm honestly kinda hopeful that somebody chimes in with resources like a questionnaire / etc. :slight_smile: Some of the factors I keep in mind are below:

  • how many users accessing the system concurrently?
    • The number of users concurrently accessing the system is one of the principal determiners of "load." I.e. if 10 users have access to the system, but only 1 is on the system at any given time, then your load is 1
    • time zones / etc. play a role here
    • It's also worth keeping in mind what platform adoption has looked like in the past. Are users excited for the change? Will adoption require migrating code from a legacy system? These things can affect what your server will look like in its early days
  • what user groups exist?
    • Power users, novice users, business users will all behave differently on the system
    • Specifically, novice users may need more governance to be sure that they do not accidentally consume too many resources
    • On the other hand, some power users need governance to be sure they do not intentionally consume too many resources
    • This can be managed proactively with User and Group Profiles
  • What are users doing?
    • Here, the biggest concern is RAM / CPU usage
    • If users are using large datasets, then you will need to allocate more RAM accordingly
    • Today, R is a heavy RAM consumer and often needs ~ 2 copies of a dataset in RAM. There are programming ways around this, and for big data, it is a best practice for power users to get familiar with these patterns. Specifically, offloading work to a database or spark.
    • From a CPU usage perspective, you want to be sure the box will not be CPU bound. R is single threaded by default, so that means a user can at max consume 1 core per R session that is actively working (although in practice, they often will use less). Heavily CPU-intensive or parallelized operations will consume more CPU
  • What is the expectation for uptime?
    • This will affect the "buffer" that you build in
    • This can also determine whether segregating into several nodes is preferable (i.e. if a user finds a way to burn all the resources on a given machine, can users still use another machine)

I think these factors give you a pretty good overall picture to set up a baseline. Keep in mind that many users will be used to having 8+ GB of RAM and a couple of cores on their desktop. You can set up a "ratio" of the above factors to get a feeling for how these factors might align best in your environment.

We have a naive little Shiny app that tries to make these determinations for you, but beware that it can overestimate a bit.

https://gallery.shinyapps.io/instanceCalc/

Further, this article might be helpful:

If you do go through this exercise, I think it would be super helpful for others if you don't mind sharing your experience, how you approach the problem, and how well your initial estimations matched reality!

3 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.