Rstudio Connect: GIT/cron Integration Approach

Hi All,

I want to vet an approach I'm using for git deployment and scheduled data for shiny apps on RStudio Connect.
(I'm on Ubuntu 16.04)

A little background:
We were on Shiny Server Pro before and this year switched to RStudio Connect.

We've been super impressed, we already have ~30-40 publishers, and with rstudio server, our entire company is using R or consuming R reporting.

However, moving from Shiny Server Pro left us with 30-40 legacy app migrations each with dependencies on CRON and a git deployment strategy.


I needed a way to let some publishers use GIT to deploy to RStudio Connect and allow them to continue to manage their data via automated scripts on the server.

I found the following solution, which may help others (or which may be a giant mistake), here were my steps

  1. Create a new directory for these kind of apps, at /path/to/shiny-server-apps/ and migrate all old apps here. chown this directory for a user called rstudio-connect-legacy
  2. For each app, create a "pointer app", an example follows:

(In a new directory, <app_name>_pointer

app.R

# Change Directory to the old directory
setwd('/path/to/shiny-server-apps/app_1')
# load namespaces of all dependencies
# we don't want to load libraries as that will force a load order
# let the app load its own libraries
loadNamespace('shiny')
loadNamespace('shinydashboard')
# If there is a global file, source it
source('global.R')
# Run app in directory
shinyAppDir('.')

Now, we deploy this pointer app rather than the original one and use the user rstudio-connect-legacy to run the app on the server.

Users can deploy just like they would on shiny server pro, by pulling their changes into the app directory. They can also run cron scripts because they can do whatever they have the permissions to do on the server. The only time this pointer app will need to be redeployed is if the package dependencies change.

Thoughts? My real concern here is that I was lazy and put all apps under rstudio-connect-legacy, mainly because the account creation for each app was going to be a hassle. The result is that technically these apps can see one another, and potentially negatively interact (The reverse is also true, they could positively interact).

However, this way only an rstudio-admin can make an app that interacts with these, akin to our behavior in shiny server Pro.

Thoughts?

I have a function that generates this:

# Make a pointer app from an app directory
# run deployApp() from the rsconnect package afterwards
make_pointer_app <- 
    function(
        from = getwd(), 
        to_dir = sprintf('%s_pointer',dirname(sprintf('%s//',from))),
        output_to_file = TRUE, ...) {
  
  if (!any(file.exists(file.path(from, c('global.R', 'ui.R', 'server.R','app.R'))))) {
    stop('Error: Not a shiny app.')
  } 
  
  if (dir.exists(to_dir)) {
    stop('Error: Directory already exists!')
  }
  
  message('migrating ',from)
  base <- expression()
  base[[length(base) + 1]] <- substitute(setwd(x),list(x = from))
  
  # Add dependencies
  depends <- rsconnect::appDependencies(from)
  
  # for each dependency load the namespace of the library in question
  for (i in seq_along(depends$package)) {
    base[[length(base) + 1]] <- substitute(loadNamespace(x),list(x = depends[i,'package']))
  }
  
  if(!all(c('ui.r', 'server.r') %in% tolower(list.files(from))) & !'app.R' %in% tolower(list.files(from))) {
      stop('Not an App: ', from)
  }
  
  # if global exists, we need to source this first, and then run the app
  if(file.exists(file.path(from, 'global.R')))
      base[[length(base) + 1]] <-substitute(source(x),list(x = list.files(from)[tolower(list.files(from)) == 'global.r'][1]))
  
  base[[length(base) + 1]] <- 
      substitute(
          shinyAppDir('.')
      )
  
  if (output_to_file) {
    dir.create(to_dir)
    write(paste(as.character(base),collapse = '\n'), file.path(to_dir,'app.R'))
  } else {
    return(base)
  }
  
}

Hi Nick,

Thanks for sharing! When we released RStudio Connect, we knew there'd be some tricky migration scenarios for home-built deployment pipelines built on Shiny Server (Pro). I'm glad you've found a fix that is working for your environment.

In general, we hope that Connect can be used alongside of Git deployment pipelines and scheduled updates. (Connect should be able to host the real apps themselves, not just "pointers").

Using Git / CI deployment pipelines with Connect:

  1. http://docs.rstudio.com/connect/admin/appendix-ci.html

  2. https://github.com/slopp/rsc-ci-test

In the past 6 months we've done some work to make this integration easier. For example, content is now published automatically if a deployment is successful - which means your CI tool can deploy new content without anyone opening the UI and clicking "Publish". In upcoming releases we're hoping to make CI deployment even easier by publicly exposing parts of Connect's API. This will allow your CI tool to do things like tag content without anyone touching the UI.

The Pros to this approach are the same Pro's to using Connect for any other shiny app: Connect continues to handle application sandboxing and package management. Deployed bundles are automatically versioned. Application logs are intuitive and accessible.

Tips for using Connect with scheduled data updates:

It is increasingly common for Shiny applications to be tied to regularly updated data. There are a few approaches to achieving this goal in Connect. All the approaches have one pattern in common: in the app, read data from a shared directory on the Connect server. This pattern might feel odd if you're used to including all the application data alongside the app code in the directory that is published to Connect.

Specifics:

  1. Continue to use CRON to update data files. Place the updates files in a directory accessible to the Connect RunAs user (such at /app-data). Have the shiny app use reactiveFileReader to read the data from the shared location.

  2. Instead of using CRON, use Connect's job scheduler to update the shared data. To do so, place the code that updates the shared data inside a RMD and publish the RMD to Connect. Schedule the RMD using Connect's scheduler. I really like this approach because I can have the RMD include documentation for the scheduled process, summary information on what the scheduled task did, and I can even have Connect email me the resulting RMD everytime the schedule runs OR the log if the schedule fails. I also like this approach because I get all the benefits of deploying the code for my schedule to Connect: package management, versioned bundles, and logs. Because RMD can include other code chunks (like Python, bash, and SQL) I can also include these tools in my scheduled tasks!

Resources:

  1. Example of a scheduled RMD that updates a Shiny app. The RMD using python to scrape a twitter feed and the Shiny app visualizes the result, using reactiveFileReader to automatically update. http://solutions.rstudio.com/twitter_etl/

  2. In this talk I demo a dashboard that has 3 components: scheduled data that is updated every day, real-time data that is updated every 5 seconds, user interactions. The daily data is updated with a scheduled RMD that calls an API.
    https://www.rstudio.com/resources/videos/dashboards-made-easy/

  3. Detailed example coming soon to RViews!

Best,

Sean Lopp
(Connect Product Team)

5 Likes

Thanks Sean!

My biggest concern with having separate data / app directories is that it makes it hard for users to get the same behavior on Rstudio Connect that they would when they're on RStudio Server.

I need to keep things very simple there, as many publishers will be new to/unfamiliar with the linux environment.

Also, if users have many apps interacting, (such as an Rmd with a shiny app), there is a significant risk that the left hand doesn't know what the right hand is doing.

If at all possible I don't want people breaking their app into pieces in order to use it on the connect server, because although I have a sense for the topology, they won't.

Keeping apps encapsulated means that we can restructure at any time and the users won't even notice. (As we did when we migrated in the first place)

I really like using Rmd as the script generators, that's something I do for many of my models, the yaml standard is just dead useful for configuration details. I would prefer to be able to do this from within an app though, so that we have visibility. For example, If we needed to delete the app, we can do it in one place without misplacing a vestigial kidney, metaphorically speaking.

Best,
Nick