Pushing data to Amazon Redshift from R

I've tried a couple solutions out of which none really works and I'm searching for other options. Let me list those options here:

RPostgreSQL & RPostgres packages - these work well for downloading data from Redshift but they do not work for uploading data back. That's because the COPY method of Redshift and Postgres are incompatible but perhaps somebody found a solution for this?

Official Redshift odbc package - again, works well for downloading data but when trying to upload it to DB (to an already created table) with the DBI package it fails. I'm executing the following code:

konrad <- data_frame(konrad = c(1))

tbl <- DBI::Id(
  schema = "risk",
  table  = "test3"
)

DBI::dbWriteTable(
  conn = db_read_redshift,
  name = tbl,
  value = konrad,
  overwrite = TRUE
)

where:

  • I have a df called 'konrad' with column 'konrad' that I want to push to an already existing table
  • the existing table is named 'test3' with column named 'konrad' that is integer in a schema called 'risk'

and that generates the following error:

Error in connection_copy_data(conn@ptr, sql, value) : 
  Failed to initialise COPY: ERROR:  syntax error at or near "STDIN"
LINE 1: COPY "risk"."test3" ("konrad") FROM STDIN
                                            ^

I found a couple topics on this but none of them really was conclusive in terms of what should be done to make that work. Can anyone propose a feasible solution? Thanks!

The deal here is that Redshift does not use the Postgres bulk loader. They require bulk load to happen via S3 or other Amazon service.

Here's an artifact from my wrestling around with this a few months ago:

2 Likes

if you have access to S3 and the right permissions to write to your Redshift from S3, you can use this:

2 Likes