Advice on a Rconnect infrastructure


#1

Hi,

I come to you not for technical reasons, but rather to get your opinion on the best organization to set up our project with Rconnect.

We have a.rmd file that must run every Monday, it aggregates data from googlesheet, from json files via htpp, but also by reading csv/xlsx that are locally on the server. These csv/xls files are updated weekly (deposited via FTP by an automatic service). This Rmd generates RDS files.

These RDS files are then used by a Shiny application.

For the moment we are obliged to:

manually knit the Rmd
moved the generated RDS to the shiny application folder
republish application

The ideal would be not to have to republish the application ( that the data are dynamically updated), see, if possible not to have to knit the Rmd every Monday. In short, everything is done on its own.

What solution can you propose for this situation?


#2

This is how I would do it, considering you have RStudio Connect server.

A Rmd document can be push to RStudio connect as Rmd code and you can schedule it. If you Rstudio connect is connected on the internet, no problem to access googlesheet and json by http.
For csv et xlsx, i would either put a shared storage, accessible rstudio connect, either put the files in a ftp server or by downloadable by http from somewhere.

About the resulting RDS, it is the same. I would either write them to disk in a shared storage, or upload them in a ftp or http service ( like S3).

Once these files are somewhere shared, the shiny application would have access to them, from the shared storage accessible via RStudio Connect (same server for Rmd and Shiny apps) or from url (ftp, http, ...).
A plumber API could also server the file as Json through the API, requested by the shiny apps. However, no improvement I think.

With this in mind, your current process would change to something fully automated. You won't need to include the data in the shiny apps, so no republishing. Every thing would run automatically.

It seems to me that the key is in where you put the shared data - it must be a shared storage somewhere:

  • A NAS connected to RStudio connect server
  • a service like dropbox or S3
  • a ftp server
  • a database
  • ....

Personnaly, I use internaly a shared disk space accessible from RStudio Connect.

It is how I would do it.

I am really interested to read form others' experiences and thoughts on this subject!


#3

We have an example outlining this process here: https://rviews.rstudio.com/2017/11/15/shiny-and-scheduled-data-r/