Hi
I am using R + PostgreSQL to organize the data at atlas.media.mit.edu
I notice writing to DB from RStudio Server is slow or unresponsive.
This is what I'm doing
if (!require("pacman")) install.packages("pacman")
p_load(data.table, feather, dplyr, tidyr, doParallel, odbc, DBI)
yearly_list <- list.files("sitc2")
try(dir.create("yearly-data"))
unzip_yearly <- function(t) {system(paste("7z e -aos", yearly_list[[t]], "-oc:yearly-data/"))}
mclapply(1:length(yearly_list), unzip_yearly, mc.cores = n_cores)
yearly_list_feather <- list.files("yearly-data", full.names = T)
yearly_data_full <- mclapply(yearly_list_feather, read_feather, mc.cores = n_cores)
con <- dbConnect(odbc::odbc(), Driver = "PostgreSQL", Server = "oeccube.datawheel.us", Port = "5432", Database = "oeccube", UID = "deploy", PWD = rstudioapi::askForPassword("Database password:"))
for (t in 1:length(yearly_data_full)) {
x <- yearly_data_full[[t]]
dbWriteTable(con, "test", x, overwrite = F, append = T)
message(paste(100 * t/length(yearly_data_full), "% ready"))
}
I also tested setting t = 1 and then see what happens just for the first stage of that loop and that list (~200 MB) and it runs for more than 24 hrs so I decided just to kill the process.
From the command line this equivalent approach is really fast:
psql -h oeccube.datawheel.us -d oeccube -U deploy -c "\copy sitc_rev2 (year, reporter_iso, partner_iso, commodity_code, export_usd, import_usd, export_kg, import_kg, marker) from 'sitc2.csv' with delimiter as ',' CSV HEADER"
Any ideas?