I'm writing a script using adist library for the name searching but I have some performance issues when I run the script to parse the data from the oracle database that are returned the name. Sometimes the program also will crash.
library("DBI")
library("ROracle")
library("adist")
library("tidyverse")
drv <- dbDriver("Oracle")
host <- "host"
port <- "1521"
sid <- "sid"
connect.string <- paste(
"(DESCRIPTION=",
"(ADDRESS=(PROTOCOL=tcp)(HOST=", host, ")(PORT=", port, "))",
"(CONNECT_DATA=(SID=", sid, ")))", sep = "")
con <- dbConnect(drv, username = "username", password = "pastword", dbname = connect.string, prefetch = FALSE,
bulk_read = 1000L, stmt_cache = 0L, external_credentials = FALSE,
sysdba = FALSE)
cust_det=dbGetQuery(con,"select * from cust_det")
data_name = data.frame(cbind(names,adist(names, "JohnSmith",partial = TRUE, ignore.case=TRUE)))
sorted_name = data_name[with(data_name,order(as.numeric(levels(data_name$V2))[data_name$V2])), ]
data_2=sorted_name %>% filter(V2<=5)
*******edited *******
data_name = data.frame(cbind(cust_det,adist(cust_det, "JohnSmith",partial = TRUE, ignore.case=TRUE)))
data_name$V2=as.numeric(as.character(data_name$V2))
sorted_name=data_name[order(data_name$V2),]
Name_search=sorted_name %>% filter(V2<=5)
So, I want to ask R community:
- How to increase the performance of running the script and is there any library to handle large data in R?
2)How to handle out-of-memory data in R when the data is large scale?