Connect: Programmatic restart application

Hello,

Is it possible to restart an API programmatically with connect?
I have APIs that read in a payload from DB (outside the endpoint code), but would like the API to rerun nightly to get fresh data. Using timeout settings doesn't seem like a reliable approach.

Thanks in advance,
BR Johannes

2 Likes

Very interesting use case!

So you are reading the data into memory for use by the API and want to make sure the data has a certain level of "freshness"? How do you envision that you would handle anyone trying to use the API at the time that a refresh is desired?

Perhaps you imagine specifying the "max_age" of an R process, so if an R process has lived more than a certain amount of time, it should be replaced by a fresh R process?

Today, the only way I can think to address this would be to do one of:

  • minimize the timeout duration and set min_processes to 0
  • manage things at the process level (i.e. have maintenance time in the middle of the night where certain processes are killed if they are older than a certain age)
  • cache the data on disk and read from disk. To do this, you might use a scheduled R Markdown report to write the data in a fast-readable format like feather or to a database optimized for reading.

Thanks for the feedback!
Yes, the payload itself from the DB is scored with a model and written to DB daily through a connect ETL. This payload is then read in by the API, som filtering and anti-joins against a customer-filter DB is done live (which is necessary), and one or more results are returned. Reading from DB every time the API is called is too slow. Setting timeouts and min_processes to 0 is undesirable, as this triggers the DB-load for each call, slowing the API down from milliseconds to several seconds.
Caching on disk is not a good scenario for us, both due to GDPR (minimum data retention on disk of sensitive data, better to get from DB), and the difficulty of having on-file disk configurations for Connect through the deploy chain.

Could the rsconnect::restart_app be adapted to be compatible with all Connect assets? We could then have a nightly ETL that restarts all APIs following the DB-update.

Thanks again!

Ah that makes sense. Sounds like you have a thorough architecture! A couple more workarounds that might be feasible:

  • create a plumber endpoint that re-loads the data from the database (i.e. /db-reload). Then you can just hit that endpoint periodically (i.e. every night) to ensure the data is updated
  • Feature request for plumber for something like Shiny's invalidateLater to interrupt plumber's waiting for new connections to run the db-reload. I believe I heard somewhere that async operations are coming to plumber - there may be a similar workaround there.
  • The advantages of both of the above is that you do not have to kill the R processes. One option would definitely be to kill the R processes (either implicitly by forcing the R process to exit or explicitly by killing them), but that could create bad experiences for any users of the API during the maintenance window

Thanks for sharing this use case, though! This is definitely good for us to be thinking through as we add features to Connect, specifically the dependency tree of applications, their state, etc. If you do implement a workaround, please share so that we can have record of the desired behavior!

Thanks for the feedback.
The db-reload endpoint seems like a simple and great idea on paper.
So, to elaborate, you would have (in pseudo-R):

# Pre-endpoint data ingest
conn <- connect_to_db()
data <- get_data(conn)

@/get_data
get_data_endpoint <- function() {
   data_filtered <- filter_data()
   return(data) 
}

@/reload_data
reload_data_endpoint <- function() {
   conn <- connect_to_db()
   data <<- get_data(conn)
}

I guess writing to the parent namespace here is necessary for the data update call (which may not be optimal), or is there another pattern that would work?

Thanks a lot

Sorry for the late response here. You're right, I think it would probably require writing to the global object, which is definitely a pattern I like to avoid. However, for database backed apps, I think it is normal to have the data defined as a global object. You would not necessarily need to redefine the conn object in reload_data_endpoint (since it is already defined globally). Even though Shiny is very different, you might have a look around the Shiny ecosystem for this type of pattern, since Shiny and Plumber do share some similar architectures (i.e. the global object, persistent connections, etc.).

One note: I would recommend using the pool package as an abstraction on top of the database connection and to beware connection timeouts (since you will be leaving this connection open), which I have hit before but not taken much time to think about.

Also, I am excusing several bad practices in your pseudo code :wink: That's what pseudo code is for, right!? :smiley:

Great, we’re already using pool for our shiny apps, so I’ll give global conns and data a shot. Thanks for the help, appreciated!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.