How to do SQL MERGE statement in R?

Recently I have a task of getting data from web API and write them to database. The data are updating everyday. There is a unique identifier column called serial_no and a column called updated_time. So I need to update the database table with data got from web API based on the serial_no column.

It's just like a MERGE statement in SQL Server. I'm wondering how to do this task in R and what's the best practice?

This is more of a database best practice than R best practice :slight_smile:

You did not mention what database backend you are using, and it is certain that the optimal solution will vary somewhat. The code below is using Postgres dialect.

In general I would suggest using two tables - let us call them a staging layer and core layer.

I would start by truncating the stage layer

dbSendQuery(con, "truncate table stg_api_results")

Then insert values of the current apis

db_insert_into(con, "stg_api_results", values = frm_API_values)

And lastly flip the stage over to core via

dbSendQuery(con, "insert into api_results 
                  select * from stg_api_results 
                  on conflict (serial_no) do update 
                  set updated_time = current_timestamp")

Also worth considering (though likely not in your use case) is on conflict do nothing, variant which is helpful in enforcing unique values (only new rows are inserted).

I am assuming an unique constraint on the field serial_no.

1 Like

@jlacko

I 'm using Microsoft SQL Server.

According to your ideas, the workflow would be

  • Create the table in sql with unique constraint
  • Fetch api result
  • Create a table and insert the api result into it
  • Send a sql statement to database to insert and update table as need

I rarely do SQL DML operation with the help of R. And it seems R is not very suitable to do this task. Maybe I shoud use Python to do the similar task next time after knowing more about it.

Thanks for your reply. And I gained more confidence of R.

1 Like

Thanks for your kind comments; the process I suggested is in fact language independent (it is about data architecture & executing SQL commands, not some fancy dplyr pipes). Both R and python will give you comparable results, so use whichever you have the most confidence in.

If you are using MS SQL Server my suggestion would be to set up the two tables - stage for current results and historized one with unique constraint for the end result - via Visual Studio. It is well suited for this kind of work. As is often the case with Microsoft products their tools are well integrated (and somewhat of a pain to access from outside the walled garden).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.