@aron I ended up doing something similar to what you suggested. I'm not a fan of modifying environments this way, so if you have any suggestions to clean this up, I'm all ears!
The implication of this approach is that the first API request after the data source has changed is going to be slower (perhaps significantly) than all others. Depending on the size of the data, there may even be HTTP timeouts to overcome.
ETL.rmd
---
title: "ETL"
author: "Jeff Keller"
date: "2019-06-14"
output: html_document
rmd_output_metadata:
rsc_output_files:
- "output/constant.rds"
---
This document represents an ETL process. In this minimal example, the output of the process is just a random number between 1 and 10. The output is available [here](output/constant.rds).
```{r etl}
constant <- sample(1:10, size = 1)
dir.create("output", showWarnings = FALSE)
saveRDS(constant, file = "output/constant.rds")
```
The current value of `constant` is `r constant` and it changes every 5 minutes.
plumber.R
#' @apiTitle ETL Example
#' @apiDescription An example plumber API that loads data from a scheduled RMarkdown document (https://connect.example.com/etl-poc/etl.html)
library(plumber)
library(httr)
# Load the API_KEY environment variable
API_KEY <- Sys.getenv("API_KEY")
last_mtime <- NULL
#' Add
#' @get /add
#' @param x
add <- function(x) {
loadData(last_mtime)
return(as.numeric(x) + constant)
}
# Function that checks whether the input data has changed
loadData <- function(last_mtime) {
resp <- httr::HEAD(
url = "https://connect.example.com/etl-poc/output/constant.rds",
add_headers(Authorization = paste("Key", API_KEY))
)
mtime <- headers(resp)[["last-modified"]]
if (!identical(mtime, last_mtime)) {
constant <- readRDS(
file = url(
description = "https://connect.example.com/etl-poc/output/constant.rds",
headers = c(Authorization = paste("Key", API_KEY))
)
)
assign("constant", value = constant, envir = .GlobalEnv)
assign("last_mtime", value = mtime, envir = .GlobalEnv)
}
}