Increase performance of function to running large data from oracle database

I'm writing a script using adist library for the name searching but I have some performance issues when I run the script to parse the data from the oracle database that are returned the name. Sometimes the program also will crash.

library("DBI")
library("ROracle")
library("adist")
library("tidyverse")
drv <- dbDriver("Oracle")
host <- "host"
port <- "1521"
sid <- "sid"
connect.string <- paste(
"(DESCRIPTION=",
"(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
"(CONNECT_DATA=(SID=", sid, ")))", sep = "")

con <- dbConnect(drv, username = "username", password = "pastword", dbname = connect.string, prefetch = FALSE,
bulk_read = 1000L, stmt_cache = 0L, external_credentials = FALSE,
sysdba = FALSE)

cust_det=dbGetQuery(con,"select * from cust_det")

data_name = data.frame(cbind(names,adist(names, "JohnSmith",partial = TRUE, ignore.case=TRUE)))
sorted_name = data_name[with(data_name,order(as.numeric(levels(data_name$V2))[data_name$V2])), ]
data_2=sorted_name %>% filter(V2<=5)

*******edited *******

data_name = data.frame(cbind(cust_det,adist(cust_det, "JohnSmith",partial = TRUE, ignore.case=TRUE)))
data_name$V2=as.numeric(as.character(data_name$V2))
sorted_name=data_name[order(data_name$V2),]
Name_search=sorted_name %>% filter(V2<=5)

So, I want to ask R community:

  1. How to increase the performance of running the script and is there any library to handle large data in R?
    2)How to handle out-of-memory data in R when the data is large scale?

Hi,

Welcome to the RStudio community!

First of all, the data that you get out of the database (cust_det) is not used anywhere in your code. You should provide us with a more comprehensive example and tell us where the issue occurs.

How long does it take R to query the database, and how large is the data that you are pulling in? (rows and columns). If data that is pulled is too large for memory, you can split it up in batches and process it one at the time if that's possible, but again we'd need more info for that on the data and where exactly things go wrong. You can use the profvis package (integrated into R-studio under the profile tab) to run all your code and see where it's slow or high in memory.

Hope this helps,
PJ

1 Like