Proper way to replace or update a database table

tyluRp · August 30, 2021, 7:05pm

I want to replace or "refresh" a table in my database. Currently, I do the following:

Delete the table with DBI::dbRemoveTable()
Write a new table with DBI::dbWriteTable()

And this is done daily. However, this results in some downtime while the new table is being written. What is a better way of doing this? How can avoid downtime and allow the old data to be available under two conditions:

While the old data is still being used
While the new data is not yet ready, i.e. hasn't been fully written to the database yet

I am using postgres and have heard about pg_dump but I am not sure if that's what I want or if that is readily built in via DBI or RPostgres. Another idea I had was passing overwrite = TRUE to DBI::dbWriteTable but I am not sure if that would solve this problem or not. Any advice would be appreciated!

andresrcs · August 30, 2021, 11:17pm

I think we need some context to give you any meaningful advice, Why do you need to completely replace your table daily? Such workflow seems odd to me, if you only need to update values, then perform an UPDATE, if you are not sure if you need to insert or update then perform an UPSERT, if you want to get ride of old registers (loosing data) perform a DELETE, if you want to keep the table small to speed up query execution use PARTITIONED tables. I'm having a hard time imagining a good reason to DROP and CREATE daily.

tyluRp · August 31, 2021, 12:02am

We drop and create data daily because we have no system in place to detect which data has changed unfortunately. I've heard pg_dump would be a good solution to this problem but I'm not totally sure.

Given my situation, could I just use a combination of UPSERT and DELETE on all columns?

andresrcs · August 31, 2021, 12:11am

pg_dump is used to create and restore backups from your database, it can restore data stored as SQL instructions but I don't think is the right tool for this job, also, Postgresql specifics are outside of this forum's scope.

That sounds like the use case for UPSERT to me, unless you don't know what columns you need to create in advance which would make not much sense for a relational database.

Yes, you can, although, in that case DELETE and INSERT will be simpler.

tyluRp · August 31, 2021, 4:57am

Thanks, I will take a look at this. I am also thinking I might try:

Write a new table “table_tmp”
Once new table is written, remove old table
Use ALTER to rename and effectively replace the old table.

That sounds like it should cut down on downtime considerably (assuming ALTER is a quick operation). For example:

#' Refreshing a table in a database
#' 
#' The goal is to replace a table with new data with as little downtime as 
#' possible. The method below does this by writing new data to a table with 
#' '_tmp' suffixed to the tables name, keeping the old data intact. Once the 
#' '_tmp' table has been written, the old table is removed and the new table
#' has '_tmp' removed. Using ALTER to rename the table is quick and not an
#' expensive operation so there is very little downtime with this approach.

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

# connect
con <- DBI::dbConnect(odbc::odbc(), "r4fun")

# write old data
DBI::dbWriteTable(con, "flights", nycflights13::flights)

# write new data with "_tmp" suffix
DBI::dbWriteTable(con, "flights_tmp", nycflights13::flights)

# remove old data
DBI::dbRemoveTable(con, "flights")

# rename new data by remove "_tmp"
DBI::dbExecute(con, "ALTER TABLE flights_tmp RENAME TO flights;")

Anyway, realizing this is getting a bit out of scope for R related questions so I’ll likely raise follow up questions elsewhere. Really appreciate the responses, thank you.

system · September 21, 2021, 4:57am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.