ETL with R Markdown on RStudio Connect - Output files

This blog is a how-to guide for leveraging R Markdown output files to create a basic ETL process that feeds other data products also hosted on RStudio Connect.

This is what I built to demonstrate the workflow:

Here is a link to the code template: https://gist.github.com/kellobri/f026e30f672b1ab8d6eb2c2f2b91deeb

More information about R Markdown output files can be found in the RStudio Connect user guide here

Please let me know if you try building something similar, I'd love to hear about it!

5 Likes

Hi Kelly! I have my permissions set to "anyone - no login required" on the markdown. This is the error i get on the Shiny app:

2019/04/24 15:37:56.864351336 Error in value[3L] :
2019/04/24 15:37:56.864366436 path[1]="https://rconnect.eqt.com/content/43/hist_snap.feather": No such file or directory
2019/04/24 15:37:56.864417636 Calls: local ... tryCatch -> tryCatchList -> tryCatchOne ->
2019/04/24 15:37:56.864419436 Execution halted

Thanks @rexeven!
Can you share your shiny code as well (or at least the relevant bits)?

library(shiny)
library(tidyverse)
library(dashboardthemes)
library(shinydashboard)
library(gt)
library(ggplot2)
library(feather)
library(httr)
library(data.table)

options(scipen=999)

source(here::here('theme_EQT_gradient.R'))

hist_snap <- read_feather('https://rconnect.eqt.com/content/43/hist_snap.feather')

hist_snap2 <- read_feather('https://rconnect.eqt.com/content/43/hist_snap2.feather')

Its odd; I can navigate to var/lib/rstudio-connect/reports...appropriate version/bundle/etc. and see the .feather files. I can also put the https address in my browser and it prompts me to download the file.

Yeah that's really curious - and you double checked that the URLs you're using there are correct? You pulled it straight from the "Open Solo" location?

Content deployed to http://connect.mycompany.com/content/42/ will have its output files available under that URL path. An output file named daily-summary.csv will be available at the URL http://connect.mycompany.com/content/42/daily-summary.csv .
The URL for your content is the same as its “Open Solo” location and is available in the RStudio Connect dashboard.

I did. And the fact that putting that exact address into my browser prompts me to download the file, leads me to believe it is correct. I feel like it is perhaps a configuration issue. Should my RConnect config file have a [Data Dir] entry? Mine does not, but perhaps it goes to default if it is not specified?

Also, I should mention: I was able to create a folder on the RConnect server that my team can all write to with a markdown document, and then read from in a Shiny app. So the concept itself is working. I just don't like that method, as we will surely run into issues when people save files with the same names. I also don't want to manage creating a folder for every Shiny app to prevent overwriting. It seems the HTTP call is the issue, but only from RConnect.

Yep - absolutely. This paradigm feels cleaner to me than the persistent storage solution for all the reasons you've noted. But only if we can get it working!

My colleague @cole mentioned that we have seen networking issues and situations where Connect has been configured in such a way that the server cannot address itself by its DNS name. If you want to open a support ticket on this issue, we would be happy to work through it with you. I think it might be worth getting some logs and learning about your config and proxy setup. If you do open a ticket, please reference this community post so that our support team can pull me in on it.

2 Likes

I opened a ticket, thanks!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Via the support ticket created, we found that this was actually due to an issue with the feather package. There's an open bug report for this here: https://github.com/wesm/feather/issues/231
We hope this will be resolved soon; in the meantime, should anyone else run into this issue, we recommend trying writing and reading .csv ​ files instead to see if this resolves the issue.

1 Like

Note that until the issue is resolved, it is also possible to just download the feather file into a temp / local location before reading it too :smile:

1 Like