Global (R) config file (or alternative) on RStudio Connect

I'm interested in solutions for sharing settings across all RStudio Connect content. I'm using a global configuration file (and R package config) to manage API keys, database credentials, etc.

Some other approaches I'm aware of is the "make everything environment variables", "store everything you need in each project", "set manually in each piece of content through RSConnect GUI", or "use something like AzureKeyVault".

I'd love to hear what solutions others use, and what they think the trade-offs are.

2 Likes

This is a fantastic question! I think the approach I generally use is to set environment variables manually in each piece of content using the RStudio Connect dashboard / user interface, however I think that has its very noteworthy downsides in tedium.

Do you ever have various "classes" of content or collaborators? I.e. certain publishers share credentials / keys, others have their own, etc.? Or does all content use the same credentials / configuration?

"Scoping" of particular secrets to certain content is the tricky part of this problem, so wondering if you have devised a solution to that! :smiley:

1 Like

I think I can share experience on this.

For database we happened to use DSN with odbc.ini to make connection available to all servers contents on some databases. For other, using by-content environment works fine.

Also, to configure some global environments variable or other configuration for all content (like kerberos initialization for all content) we use a supervisor script and that works well.

Otherwise, I find that config files and environment variables by content is an easy working solution. Environment for secret and config file for other deployment configuration.

However, I don't have yet solution for group configuration, apart from shared packages for those similar content, but it is not ideal for secret sharing.

I am eager to read further what others do !

Thank you both for the thoughtful responses! You've both certainly given me a lot to think about.

DSNs

One thing I neglected to mention in my post is people typically perform work on their own machines. Keeping people DSNs standardized across Windows machines relies (AFAICT) on people manually entering the information identically. I think if we all worked on RStudio Server (which we have), we would also use DSNs. My decision to work with config files and store everything there is based upon needing it to be cross platform and easily deployable for people of varying backgrounds. I'm hopeful we will transition all of our work to the server within the next year.

Scoping permissions

We don't have different classes of content or collaborators right now. We're a team of 6, and only 2 of us regularly deploy content for others to consume.

I would scope permissions through different runAs users on Connect. The default configuration would have access to secrets we were comfortable with anyone using. The file name would be based on the user name, so we could access it using Sys.getenv("USER"). Alternatively, the supervisor script could change the env var R_CONFIG_FILE based upon the user. Each file might be self-contained, or if each file is a subset of the following one, the active config from each would get merged together, with the "higher permission" files taking precedence over variables in "lower permission" files. File permissions would be set appropriately so no one could access it from an inappropriate user.

I think this would go a long way to reducing duplication, and since they'd all start from the same template and be stored in the same place, it would still be pretty simple to update anything which needed it.

If we needed more than 3 or 4 groups, I'm not sure this would be the greatest idea since I construct the config files by hand.

Connect env vars

One issue I have setting through Connect is standardization. Ideally, the information for one connection should be represented in exactly the same way across all pieces of content. I think this would be easy to have small differences. Is there a way to set environment variables through the connect API? That would make things much more convenient. I might be blowing this out of proportion given the amount of content we have deployed and the frequency of new content...

Another is repeating yourself. If, for some reason, a connection needs to switch to a different username and PW, it has to be updated in many places. If things aren't standardized, it might even be easy to miss a place where it needs updating. I think an API interface would also take care of this problem.

A third is knowledge. We're starting to collaborate outside our group, and a setup which "just works" after they follow very few steps (i.e., copy our base config file and set 1 environment variable) is very appealing. Explaining that for each connection they must instantiate it by putting credentials in environment variables, etc. is more overhead. Our org suffers from low data discoverability, so having all the connections in a centralized place helps highlight the variety of sources available.

I think this is a good strategy when you're dealing with users who have some sort of background (or interest) in software development, but I fear it would be off putting to many people.

Final thought

Use env vars through Connect for API keys, but source them through the config file using something like connect_api_key: "!expr Sys.Getenv("RSC_API_KEY")". This gives a uniform interface to access secrets and configuration values, so users don't have to remember the difference between the two.
I presume few people would need to add API keys, so most would not have to worry about this step, which also appeals to me.

Thanks again for the great responses.

1 Like