Global reactivePoll data on RStudio Connect caching

Hello, I have a few sets of Rmarkdown files on RStudio Connect that update nightly or hourly. In my shiny apps I then read the data globally in the app from S3 via reactivePoll. The time interval to check for new data is set for an hour, i.e. 3600000 milliseconds. It will run the check function, which checks a timestamp file that is generated at the end of each markdown run to track when the data was updated. I print the timestamp out when the check function runs so I can see it shows the newest file date but will not actually load the file as I don't see the print line from the value function happening.

If I hit the app enough times or re-deploy the app entirely the data will refresh. It seems like something is caching still. It's hard to recreate an example of this so a long shot that I'll get help but curious if anything stands out as completely wrong with how I'm using reactivePoll? Thoughts very welcome, thank you!

#Functions defined in global.R:
load_fst_file_rp <- function(file_name) {

    reactivePoll(3600000, NULL,
                 checkFunc = check_upload_timestamp_fst,
                 valueFunc = function() {
                   load_fst_file(file_name)
                 })
}

check_upload_timestamp_fst <- function(){
  timestamp <- save_object("timestamp.fst", bucket = bucket_name, 
                           file = "timestamp.fst",check_region = FALSE) %>%
    read_fst()
  ts <- timestamp$ts
  print(paste0("Checking upload timestamp fst: ", ts))
  ts
}

load_fst_file <- function(file_name) {
    print(paste0("Loading: ", file_name))
    save_object(file_name), bucket = bucket_name, file = file_name,
                check_region = FALSE) %>%
      read_fst()
  }
}

#app.R
source(global.R)

data <- load_fst_file_rp("data.fst")

ui <-  function(request) {
  #UI Stuff
}

server <- function(input, output, session) {
  
  output$dataTable <- DT::renderDataTable({
    
    data()
    
  })
  
}

# Run the application
shinyApp(ui = ui, server = server)

Wild guess following...

It seems to me you're reading the data in the global section of your app. Anything in global will only execute at app startup time.

Does it make any difference to your app when you move the line

data <- load_fst_file_rp("data.fst")

into the server section?

Hi @andrie,

I have apps that load data in the server as you say and those seem to work fine. In a select few apps I have one large data set I want shared across sessions, this data is updated nightly. I realize things in the global context only load once but I thought using reactivePoll would still check for new data at the specified interval and update the data if the check function was invalidated. Is that incorrect?

Thanks!
Heather

Hey @hlendway,

Thanks for sharing this use case! I find this type of thing very intriguing and useful on RStudio Connect. I believe reactivePoll will not fire this way in the global space because it is "outside" of the reactive context or something like that.

In any case, one option is to throw that reactivePoll into the server function, which will get it to run only when someone is connected to it. Then you can use super-assignment to clobber the global object... (<<-) :grimacing: Not exactly sure what that would do for other users who are on the app at the same time. Also, it would only run when someone is connected to the app.

If you definitely need a "preparatory ETL" to run every hour, whether someone is connected to the app or not, this pattern might work for you (it has worked for me) :smile:

I think generally speaking app developers are best served to avoid patterns like this by pre-aggregating into a database or a pin or something that can be accessed real-time without damaging latency. However, there are times where those (preferred) patterns are either (1) too much work, (2) not accessible due to access restrictions or (3) do not fit for some other reason.

I hope that helps!! I'm definitely curious to hear more about the size of your dataset and such, and whether this type of pattern works well for the performance of your app across multiple sessions on the same R process. (You can test this yourself by setting "MaxProcesses = 1" or the load factor sufficiently high, and connecting to the app in multiple browser windows).

1 Like

Hi @cole thank you for the ideas, I really appreciate it! For this particular use case I realized the data set is small enough so loading it for every connection turned out not to be horribly slow. Initially it was loading nightly but I was recently asked to refresh hourly. Typically for any of our database connections it's not ideal to do a live connection for a few reasons, 1. The database owners don't want us accidentally taking down production databases and adding traffic at higher traffic times. 2. Certain databases are painfully slow during the workday and frequently have outages due to high traffic, so it's not always reliable to connect during the work day.

I'm curious to test out this solution on my other apps with larger data sets which are anywhere from 100 K rows to millions. Always happy to share more about our use cases :slight_smile: Thanks as always for the ideas on how to implement this!

1 Like

Thanks so much for sharing! That's very interesting! We are always happy to learn more about these types of use cases, and I'm sure others in similar scenarios will find them useful as well!! :smile:

1 Like