Installing High Availability Rstudio connect and R package manager

Hello,
Our company uses single R studio connect environment setup. We are now in the process of setting up a high availability environment with 2 node. But we had below queries on the same.

  1. Is there any step by step documentation available as installation guide.
  2. Do we need to install R Studio Pro , R Studio Connect and R Package Manager separately or do they come as a bundle.
  3. Do we require to install internal load balancers for each of them or can we use AWS ALB's.
  4. How do we enable sticky sessions to route user sessions properly.
  5. Is separate PostgreSQL DB required for the HA setup and do we need to setup backup options with the help of DBA.

Any inputs on the same is appreciated!

You can find the checklist for HA on Connect in the admin guide at https://docs.rstudio.com/connect/admin/load-balancing/#high-availability-load-balancing

Our best practise recommendation is to run each product on it's own server. This ensures that you can easily scale the server for the load that is being placed on it by that single product. For this reason the downloads and installers are separate, and not bundled. See https://docs.rstudio.com/installation/ .

For RStudio Connect as well as Package Manager you must provide your own load balancer (with sticky session support). AWS ALB should work fine, but your own team should evaluate and configure this. Unfortunately RStudio can't provide guidance for configuration of ALB.

Both of these products require a PostGres instance, that your DBA should provide and maintain (including backup, HA, etc). However, the creation of tables on these instances is managed by the RStudio products, so your DBAs will not have to create tables or schemas. See https://docs.rstudio.com/connect/admin/database/postgres/ and https://docs.rstudio.com/rspm/admin/database/#database-postgres

Hi Andrie,
Thanks for the valuable inputs, so that means I have to maintain separate load balancer for Rstudio Connect and R package manager and both of these products needs only a single POSTGRESQL instance to work with. Hope I have got it right.

Let me evaluate this and get back to you in case I have any new queries.

Hi Team,

We have set up a sandbox environment for Rstudio connect which is highly available but have questions in configuring the Shared Data Directory. R studio connect documentation states they have a default shared location which is /var/lib/rstudio-connect. Now if I customize it to /mnt/rstudio-connect should I be moving whatever is present in default location to the new one and since we have setup a postgresql as part of high availability do we need to do the below as well.
[Server]
DataDir = /mnt/rstudio-connect

[Database]
Dir = /var/lib/rstudio-connect/db

Please guide us on the steps.

Hi
Andrie,
Does the "sticky session" support on an external load balancer also apply to RStudio Server Pro in a load balanced setup?
Thank you,
Laura

You can find the load balancer requirements for RStudio Server Pro at https://docs.rstudio.com/ide/server-pro/load-balancing.html#load-balancing

Note that load balancing for RStudio Server has some particular “stickiness” requirements stemming from the fact that users must always return to the same R session where their work resides (i.e. their traffic can’t be handled by more than one node). As a result, it’s not enough to simply place multiple RStudio Servers behind a conventional hardware or software load balancer—additional intelligence and routing is required.