hey everyone,
I have the same problem of excessive memory use and memory_limits that are apparently not respected.
In the code below I created a 450M rows and 5 columns by replicating iris 3 million times. It takes about 16GB of RAM according to lobstr::obj_size().
I am ableto dbWriteTable() the table from memory to duckdb, but the R process memory usage doubles to reach 37GB according to htop's RES column. I have set memory_limit to 1GB and threads to 1.
After this, I restart the session, reconnect to the duckdb database and try to collect to data in RAM. Again, the memory usage reaches over 32GB while reading, before dropping down to around 16GB when it is done.
Is this "RAM usage double of data size" situation normal?
library(DBI)
library(dplyr)
library(dbplyr)
library(duckdb)
duckdb_path <- "/devroot/sandbox/tmp/duckdb.duckdb"
con <- dbConnect(duckdb::duckdb(dbdir = duckdb_path), config=list("memory_limit"="1GB") )
#dbExecute(con, "PRAGMA threads=1; PRAGMA memory_limit='1GB';")
# run this once to create the duckdbfile then restart session:
if (FALSE){
bigdata <- data.table::rbindlist(rlang::rep_along(1:3e6, list(iris)))
dim(bigdata) # 450M rows, 5 columns
lobstr::obj_size(bigdata) # 16.20 GB in RAM
dbWriteTable(con, "straight_from_memory", bigdata)
}
bigdata <- tbl(con, "straight_from_memory") %>% collect()