Use single Rmd to generate >1 output files that don't get overwritten with new runs?

joseepoirier · February 24, 2020, 3:42pm

Hi,
I am looking to run a report on a weekly basis, but want the output file to be saved so we have a repo of archived weekly reports.

The idea is to use an Rmd and schedule it to run every week, but by default it seems to overwrite the previous week's report. I tried generating html output files that had unique names (ie with a date) so it wouldn't be overwritten but that fails.

Any thoughts on how to programmatically reuse an Rmd to generate and store multiple output files over time?

mattwarkentin · February 24, 2020, 4:05pm

Hi @joseepoirier,

Can you describe how you generated uniquely named files but it didn't work? Using the output_file arg for rmarkdown::render() should do the trick.

For example if I saved this RMarkdown file as report.Rmd:

---
output: html_document
title: Weekly Report
author: me
date: "`r Sys.Date()`"
---

Todays date is `r Sys.Date()`

I can run

rmarkdown::render('report.Rmd', output_file = glue::glue('report_{Sys.Date()}'))

Each time you run the above R code, it will produce a report that has the date with the prefix "report". Does this help?

joseepoirier · February 24, 2020, 9:49pm

Hi Matt,

Thanks for your response. You're right: I have been able to generate uniquely named files by using output_file in the YAML:

---
output: html_document
title: Weekly Report
author: me
date: "`r Sys.Date()`"
knit: (function(inputFile, encoding) { rmarkdown::render(inputFile, encoding = encoding, #output_file = file.path(dirname(inputFile), paste0("Report_", Sys.Date(), #".html"))) })` to the yaml.
---

The issue comes when I publish report.Rmd to RSConnect. RS Connect won't create report_{date}.html: it keeps the Rmd title as the output filename (report.html). I tried adding

rmarkdown::render('report.Rmd', output_file = paste0("report_", Sys.Date(), ".html") as a code block at the end of report.Rmd then publishing it to RSConnect.

What seems to work is keeping the output_file option in the yaml, knitting the Rmd locally, then deploying the html file to RStudio Connect with deployDoc("report_2020-02-24.html"): RStudio Connect then lists the output file "report_2020-02-24.html" although its url is company.com/content/{number}/report.html . Also the goal was to leave report.Rmd on RSConnect, schedule it to run every day, and see the daily report_{date}.html files all stored under Content.

Am I missing something?

cole · February 24, 2020, 10:20pm

This is a fantastic question!! I think the small piece that you are missing is that when Connect renders content each day, for instance, it stores the "whole output bundle," as its own object each day (including all associated output files). This content can then be browsed interactively (Navigate to "History" and then an older run time):

The URLs for each of these items look like: myconnect.com

The "revision" part of the URL points to a specific output bundle (and all associated output files).

As a result, day one may be at:

Where day two is at a different _rev URL:

As a result of this, you don't actually have to name files uniquely in order for this to work

Further, while rendering the report, you have access to the revision information in your code through environment variables (described here):

https://docs.rstudio.com/connect/user/rmarkdown/#r-markdown-including-urls

Unfortunately, today there is not a great way to list these output bundles for programmatic perusal as you may be shooting to do. Is it possible for you to articulate what you're shooting for here? What do you plan to use these HTML documents for over time, and how will they be referenced?

If you want to decouple the output HTML from the Rmd that produced it, then you could also deploy the static HTML as its own static content item on Connect (perhaps using the experimental connectapi package to marshal programmatic deployment, or calling the API directly).

EDIT: Formal docs on this topic are here: Posit Connect Documentation Version 2024.02.0 - Report History

joseepoirier · February 25, 2020, 11:33pm

Thanks Cole. I knew about the neat history feature but it didn't quite do what I wanted for 2 reasons:

The unique titles included information about which segment was included in the report if not not all products/customers were included, and that segment can change regularly (ie although a single Rmd builds all the output htmls the contents of the htmls themselves differ from day to day)
Diving through a history made it difficult for stakeholders to spot the report they were looking for, esp if they were looking for a specific segment.

I played with my original set up and got it to work as I wanted. I may be pushing the boundaries of what RSC was designed for , but it meets our needs well. Bullet point version:

Rmd #1 determines whether an html report (= for stakeholders) should be run and if so, for which segment(s).
If a report should be generated, it renders Rmd #2 for each segment. Each time Rmd #2 is knitted it outputs a uniquely named html file that I list as an output_file for Rmd #1.
Rmd #1 is scheduled to run daily and creates a log of which html reports were generated that day.
The Rmd #1 log includes the URLs to the output files so they can be easy to find in RSConnect.
In Rmd #1 I customize the emails that are sent daily so they are only sent when a new html/stakeholder report is available and includes in each of the HTML reports as attachments.

I'll try to write up a blog post on it with more detail if others might find it useful.

cole · March 20, 2020, 12:09pm

Apologies for the delayed response here @joseepoirier! Thanks for sharing this detail, that is very helpful! We would love to see more detail (and I'm sure others would as well), so please do let us know if / when you have a chance to write up more detail! A post in R Admins / RStudio Connect linking to your blog post would be very welcome!

Please let us know if you have other thoughts / ideas / questions / feedback for us!!