Hello All,
I am trying to extract data from JOIN of 4 Oracle views. The total number of fields is ~125 and volume of data is ~300MB with 3.3Million records. However performance is not as per the expectations as it took more than 12 hours. I need to execute this for more than 10 GB of data. What approach/package we need to apply to make it more faster and efficient. Please note I have 8 GB of RAM in my system. Any help will be appreciated.
Below is code -
Start_Time <- Sys.time()
Sys.setenv(JAVA_HOME='C:\\Program Files\\Java\\jre7')
options(java.parameters='-Xmx8g')
memory.limit(size=10000000000024)
library(rJava)
library(RJDBC)
drv <- JDBC(driverClass="oracle.jdbc.OracleDriver", classPath="<<PATH_TO_OJDBC.JAR>>")
jdbcConnection <- dbConnect(drv, "<<SCHEMA_DETAILS>>", "<<USER_ID>>", "<<PASSWORD>>")
sqlquery <- "select
ABC.F_1,
PQR.F_2,
JKL.F_3,
.
.
.
XYZ.F_125
FROM ABC, PQR, JKL, XYZ where ABC.PK=PQR.PK AND ABC.PK=JKL.PK AND ABC.PK=XYZ.PK"
res <- dbSendQuery(jdbcConnection,sqlquery)
result <- list()
i=1
result[[i]] <- fetch(res,n=100000)
while(nrow(chunk <- fetch(res, n=100000)) > 0){
i<-i+1
result[[i]] <- chunk
}
output_data <- as.data.table(do.call(rbind, result))
save(output_data, file='output_data.RData')
gc()
rm(output_data)
End_Time <- Sys.time()