Seeking recommendations on scaling my Shiny app on RStudio Connect

My Shiny app running on RStudio Connect (on-prem infrastrucutre) supports about 150 concurrent users. Outside of my server code, I am loading a large-ish data.frame for all users to share. The data.frame contains about 5.5 million rows and 30 columns (mostly integer and low-length strings). I am getting reports that occasionally the application is unable to load. Downtime usually lasts between 5-20 minutes. Here are my current performance settings:

Max processes: 18
Min processes: 0
Max connections per process: 20
Load factor: 0.5
Idle timeout per process: 50 (seconds)
Initial timeout: 1200
Connection timeout: 10800
Read timeout: 7200

Any thoughts on these performance settings given the information I have provided? Thank you.

What (error) message do your users see? Are they told that there are no free seats on the app or is the app just slow/unresponsive?

With your parameters Connect should put 10 users each unto each process until 18 processes serve 180 users. After that, the number of users on each process would increase to 20, serving a total of 360 users. That is much more than your estimate of 150 concurrent users, so I would be surprised if the users were to see a message about "no free seats".

However, it might be the case that a single process cannot serve 10 (or even 20) users. Keep in mind that if one user triggers some compute intensive action, all other users that are connected to the same process have to wait for that process to end. You can work around this by lowering Max connections per process or Load factor, but that comes at the cost of an increased memory footprint, since you will need more processes for the same number of users. An alternative approach would be asynchronous programming, e.g. using the promises package.

Another possibility is the memory footprint of the processes. Maybe the system has to start so many R processes for this app, each of them using a lot of RAM due to the loaded data. If your server runs out of memory, performance will deteriorate quickly. If you know when your app was slow, you can look for such events in the admin dashboard within Connect or any other server monitoring tool you are using. If it is a memory issue, you could increase Max connections per process or Load factor, but that comes at the cost of an increased user count and therefore CPU load per process. An alternative approach could be to not store the full data in memory. Tools like arrow, duckdb or other database systems can be used to offload much of the preprocessing of the data, so that only aggregated or subseted data needs to be loaded into R. Of course, it depends on the prices use case if something like this is feasible.