Best practices for setting up RStudio Server Pro/Connect in Enterprise environment

best-practices

#1

Hi everyone,

Hoping to gather some collective knowledge from this audience to help drive best practices at installation and configuration. We currently operate a two server environment (Dev/Prod) on RHEL 7.4 with RStudio Server Pro (Dev) and RStudio Connect (Prod).

We are currently on officially supported IT infrastructure but have strong headwinds that threaten our stability. We will be implementing an IT policy where all servers will be updated every month using "yum upgrade" and restarted. We also need to try and install/run applications as service accounts, not root. And finally, we need to isolate the package installation and move them to mounted directories.

The major concern I have is the yum upgrade. This threatens our R installation and R library dependencies. The RStudio Server Pro allows compilation from source and the use of mounted directories in /opt/ for multiple concurrent versions. But who's to say that a new RHEL release of libxml won't break all the dependent packages? Does each library need to be compiled against every RHEL dependency? If it does, even packrat wouldn't save us.

The other issue is consistency. Is there a simple way to force RStudio Connect to use specific versions of R when it is pushed to? From what I gather, each application is re-compiled against whatever version of R exists on the Prod server. I'm guessing it has similar capabilities to look in /opt/ for versions of R that are prepared?

Lastly, can anyone provide guidance on how they handle changes to Production applications? Do you consider each app/dashboard an application? Or do you consider the RStudio Connect environment the application? I can see both sides of this and I'm not sure which would be appropriate.

Thanks for the help!


#2

Hi @wolfpack

Do you currently use a staging server as part of your analytic architechture?

This sounds like a great time to advocate for the necessity of a staging server. This article from @nathan covers some of the common use cases of staging servers w/r/t RStudio products: https://support.rstudio.com/hc/en-us/articles/360007833814-RStudio-staging-servers

Use case 1: Testing your computing environment
You can use the staging server to test changes to your compute environment. For example, when you upgrade Linux, R, or RStudio, you will want to test the upgrade in the staging server first. You can also test product configurations before applying them in production. Using staging will help you roll out consistent changes across your environment.


#3

Hi @kellobri,

We do not have a staging server due to lack of funding. That has been one of our major frustrations with the 1-copy-installation with the basic RSConnect license.


#4

Wow, yeah. I can see how frustrating that is. Not having access to a staging server is really going to put you in a tough spot. Have you been thinking about any creative free/low-cost hold over type solutions for mimicking a staging environment?

Obviously having funding for an RStudio Connect staging license would be ideal. But if that option is going to be off the table for the time being, you could potentially create a staging environment without the professional products, simply with the goal of validating R package dependencies against system library updates.

I like using VirtualBox + Vagrant + a Configuration Management Tool (like Ansible) on my local machine to set up small testing environments. There are Vagrant boxes freely available for many of the major linux distributions. You could even install Shiny Server Open-Source and test some manual data product deployments. I know that going this route would put a lot o the work back on you. And doing this kind of testing every month sounds like more than I would be comfortable taking on. There are certain parts that could probably be automated, but it will likely take a while to decide what your goals are and how to go about implementing an automation.

I hope you continue to share what your findings and what your solution wins turn out to be along the way!


#5

A few ways to handle this. I generally think of each app / dashboard as an application, and Connect allows reverting to previous versions / etc. You can also use "vanity URLs" to abstract away a specific application endpoint from the URL (moving that vanity URL to a new piece of content, after testing, can become the new "live" version).

The problem we/you really need to test is - what does yum upgrade do to already built R packages. From yum upgrade's perspective, you are definitely affecting the whole Connect server, but I am hopeful that there will be some level of protection in the way packages are compiled/upgraded and that you will only need to worry about backwards incompatibilities in the system libraries (which are hopefully few/far between).

This sounds like you are interested in learning how Connect chooses an R version? The short answer is that it generally tries to match R version as closely as it can to the one used in development. The longer answer is that you are in control, and it can be configured as you like (depending on the R versions that you make available).

I definitely am interested to see what happens to R packages when the underlying dependency is upgraded. I'm also curious if your org could run the yum upgrade on RStudio Server Pro first, so at least you have some runway to debug issues before deploying those same library changes to RStudio Connect. As @kellobri suggested, maybe some type of configuration management (Ansible, Puppet, Chef, Terraform, etc.) will help keep the boxes in sync so your RSP server becomes your "staging" server of sorts.


#6

Thank you @kellobri and @cole for the responses.

I am very concerned about the updated system libraries caused by yum upgrade. We have pitched the idea of forced versioning on specific system libraries but that was not an acceptable to our IT group.

Theoretically it should not be a major concern since packages like gdal and libxml aren't likely to have major changes, but who knows. It would be nice if Connect offered some way to use specific system libraries that are pre-compiled (like packrat). I guess this is how people use docker but that would require an unknown level of effort.

@cole, you comment about using a production vs. development version with vanity URLs is how we currently handle this issue. My question was (unclearly) focused on how to handle this in an SLA environment. What are the thoughts on having an SLA for each app vs. the Connect application?

@kellobri I think the idea of using the Shiny Server as the pseudo staging environment might be the interim solution. Our upgrade pathway is to roll out updates to Dev servers 1 month prior to production. This would allow us to test changes prior to them being made on Prod. Maybe we could run some automated scripts using shinytest on those applications but we cannot take advantage of Connect since it lives on Prod. I guess we'll just write some cron jobs instead.