This is a fantastic question. I'm honestly kinda hopeful that somebody chimes in with resources like a questionnaire / etc.
Some of the factors I keep in mind are below:
- how many users accessing the system concurrently?
- The number of users concurrently accessing the system is one of the principal determiners of "load." I.e. if 10 users have access to the system, but only 1 is on the system at any given time, then your load is 1
- time zones / etc. play a role here
- It's also worth keeping in mind what platform adoption has looked like in the past. Are users excited for the change? Will adoption require migrating code from a legacy system? These things can affect what your server will look like in its early days
- what user groups exist?
- Power users, novice users, business users will all behave differently on the system
- Specifically, novice users may need more governance to be sure that they do not accidentally consume too many resources
- On the other hand, some power users need governance to be sure they do not intentionally consume too many resources
- This can be managed proactively with User and Group Profiles
- What are users doing?
- Here, the biggest concern is RAM / CPU usage
- If users are using large datasets, then you will need to allocate more RAM accordingly
- Today, R is a heavy RAM consumer and often needs ~ 2 copies of a dataset in RAM. There are programming ways around this, and for big data, it is a best practice for power users to get familiar with these patterns. Specifically, offloading work to a database or spark.
- From a CPU usage perspective, you want to be sure the box will not be CPU bound. R is single threaded by default, so that means a user can at max consume 1 core per R session that is actively working (although in practice, they often will use less). Heavily CPU-intensive or parallelized operations will consume more CPU
- What is the expectation for uptime?
- This will affect the "buffer" that you build in
- This can also determine whether segregating into several nodes is preferable (i.e. if a user finds a way to burn all the resources on a given machine, can users still use another machine)
I think these factors give you a pretty good overall picture to set up a baseline. Keep in mind that many users will be used to having 8+ GB of RAM and a couple of cores on their desktop. You can set up a "ratio" of the above factors to get a feeling for how these factors might align best in your environment.
We have a naive little Shiny app that tries to make these determinations for you, but beware that it can overestimate a bit.
https://gallery.shinyapps.io/instanceCalc/
Further, this article might be helpful:
If you do go through this exercise, I think it would be super helpful for others if you don't mind sharing your experience, how you approach the problem, and how well your initial estimations matched reality!