RStudio Professional PostgreSQL driver performance

driver
odbc
dbi

#1

Hi

I am using R + PostgreSQL to organize the data at atlas.media.mit.edu

I notice writing to DB from RStudio Server is slow or unresponsive.

This is what I'm doing

if (!require("pacman")) install.packages("pacman")
  p_load(data.table, feather, dplyr, tidyr, doParallel, odbc, DBI)

yearly_list <- list.files("sitc2")

try(dir.create("yearly-data"))
unzip_yearly <- function(t) {system(paste("7z e -aos", yearly_list[[t]], "-oc:yearly-data/"))}
mclapply(1:length(yearly_list), unzip_yearly, mc.cores = n_cores)
yearly_list_feather <- list.files("yearly-data", full.names = T)

yearly_data_full <- mclapply(yearly_list_feather, read_feather, mc.cores = n_cores)

con <- dbConnect(odbc::odbc(), Driver = "PostgreSQL", Server = "oeccube.datawheel.us", Port = "5432", Database = "oeccube", UID = "deploy", PWD = rstudioapi::askForPassword("Database password:"))

for (t in 1:length(yearly_data_full)) {
    x <- yearly_data_full[[t]]
    dbWriteTable(con, "test", x, overwrite = F, append = T)
    message(paste(100 * t/length(yearly_data_full), "% ready"))
  }

I also tested setting t = 1 and then see what happens just for the first stage of that loop and that list (~200 MB) and it runs for more than 24 hrs so I decided just to kill the process.

From the command line this equivalent approach is really fast:

psql -h oeccube.datawheel.us -d oeccube -U deploy -c "\copy sitc_rev2 (year, reporter_iso, partner_iso, commodity_code, export_usd, import_usd, export_kg, import_kg, marker)  from 'sitc2.csv' with delimiter as ',' CSV HEADER"

Any ideas?