Data extraction from large tables.

#1

Hello All,

I am trying to extract data from JOIN of 4 Oracle views. The total number of fields is ~125 and volume of data is ~300MB with 3.3Million records. However performance is not as per the expectations as it took more than 12 hours. I need to execute this for more than 10 GB of data. What approach/package we need to apply to make it more faster and efficient. Please note I have 8 GB of RAM in my system. Any help will be appreciated.

Below is code -

Start_Time <- Sys.time()
Sys.setenv(JAVA_HOME='C:\\Program Files\\Java\\jre7')
options(java.parameters='-Xmx8g')
memory.limit(size=10000000000024)
library(rJava)
library(RJDBC)

drv <- JDBC(driverClass="oracle.jdbc.OracleDriver", classPath="<<PATH_TO_OJDBC.JAR>>")
jdbcConnection <- dbConnect(drv, "<<SCHEMA_DETAILS>>", "<<USER_ID>>", "<<PASSWORD>>")

sqlquery <- "select
ABC.F_1,
PQR.F_2,
JKL.F_3,
.
.
.
XYZ.F_125
FROM ABC, PQR, JKL, XYZ where ABC.PK=PQR.PK AND ABC.PK=JKL.PK AND ABC.PK=XYZ.PK"

res <- dbSendQuery(jdbcConnection,sqlquery)
result <- list()
i=1
result[[i]] <- fetch(res,n=100000)
while(nrow(chunk <- fetch(res, n=100000)) > 0){
  i<-i+1
  result[[i]] <- chunk
}

output_data <- as.data.table(do.call(rbind, result))
save(output_data, file='output_data.RData')
gc()
rm(output_data)
End_Time <- Sys.time()
0 Likes

#2

so without R involved at all, how long does it take for this same query and use spool to chunk it into a text file: https://asktom.oracle.com/pls/asktom/asktom.search?tag=sqlplus-query-output-to-csv-or-txt-format

You won't get R to do it any faster than the native Oracle tools. So that's a starting point. If you find that spool is much faster then you might consider just having R execute the command line functions to fire off the spool job, create a local CSV, then suck that into R using a fast csv reader like read_csv from the readr package

1 Like