Design patterns for RStudio Connect with RStudio Package Manager

Dear all,

I have a question regarding best practices for deployment on RStudio Connect in combination with RStudio Package Manager (or any other internal package repo).

Our current design pattern for developing products in R mandates using R-packages for all content - e.g. all types of data science products, whether it's utility packages, RMarkdown reports, plumber APIs or Shiny Dashboards, should be encapsulated in a package to motivate environment encapsulation, tests, documentation and modularity. We use Golem for Shiny, and a modified setup for APIs/MDs, placing content files in inst/api or inst/markdown.

In my view, there's two options for deploying such content packages to Connect:

  1. Publish product packages to RSPM or repo. Only deploy content file to Connect (e.g. app.R, entrypoint.R, report.md) - the content file takes dependencies on the RSPM-package.This would require the package to live on RSPM, even for development and testing.
  2. Deploy the full package to Connect, and use pkgload::load_all or similar in the content-file. This will allow publishing the product directly to Connect without requiring putting the package on RSPM first.

Personally, I like option 1 best, as this seems to be more "pure" in terms of dependency separation, and also forcing users to put all packages on RSPM, which will allow for reuse of functions, etc. However, in the traditional DevOPS mindset, the deploy artifact seems to be the full software package.

Any thoughts would be appreciated!

Thanks

2 Likes

Hi @Jwaage,

This is a really interesting question. We don't often see people putting everything into packages, but I can see why it'd be helpful.

Can you say a little more about why you want the packages deployed on RStudio Package Manager? I assume people will be git clone-ing the projects to work on them rather than install.packages-ing them from RStudio Package Manager.

In terms of how the install process would work, I think the best option is to include a deploy function in your app, which I think is option 2.

One nice RStudio Connect feature this setup forecloses is git-backed deployment. RStudio Connect currently can only add other files in subdirectories of the ones where the content file is, so you couldn't add files in the R/ directory if the actual app is in inst/.

Hi @alexkgold, thanks for the answer.

As you state, coders would obviously git clone repos down for development, but, if going for scenario 1, packages need to be deployed to RSPM for Connect to be able to pick them up during deployment of the app file, which doesn't come with the package itself, but only takes a dependency on it.

It really think the R package is the right vehicle for any enterprise data science project of any kind, as it has a very mature ecosystem for testing, development and encapsulation.

/J

Hi @Jwaage,

That makes sense. I think that need to simultaneously deploy the package to RSPM and take a dependency on the deployed version is the reason #1 might not be a great option. There's a circular dependency that will present a particularly big roadblock if you're trying to do git deployment. The content can't be deployed to RStudio Connect until the package has built on RStudio Package Manager, but those deployments are supposed to come from the same commit.

A more common pattern we see is each project having a package associated. So you get the advantage of documenting the functions and features of the analysis, but avoid the complexity of the deployed content actually being part of the package.

We've got an example of that pattern here if you're interested - that's a project with a number of different deployed assets to ingest, model, serve, and visualize model predictions. All of the work across the different content is being done in a package, but the deployed content itself sits next to the package, rather than inside.