large memory usage of {{duckdb} package

hey everyone,

I have the same problem of excessive memory use and memory_limits that are apparently not respected.

In the code below I created a 450M rows and 5 columns by replicating iris 3 million times. It takes about 16GB of RAM according to lobstr::obj_size().

I am ableto dbWriteTable() the table from memory to duckdb, but the R process memory usage doubles to reach 37GB according to htop's RES column. I have set memory_limit to 1GB and threads to 1.

After this, I restart the session, reconnect to the duckdb database and try to collect to data in RAM. Again, the memory usage reaches over 32GB while reading, before dropping down to around 16GB when it is done.

Is this "RAM usage double of data size" situation normal?

library(DBI)
library(dplyr)
library(dbplyr)
library(duckdb)

duckdb_path <- "/devroot/sandbox/tmp/duckdb.duckdb"

con <- dbConnect(duckdb::duckdb(dbdir = duckdb_path), config=list("memory_limit"="1GB") )
#dbExecute(con, "PRAGMA threads=1; PRAGMA memory_limit='1GB';")

# run this once to create the duckdbfile  then restart session:
if (FALSE){
  bigdata <-  data.table::rbindlist(rlang::rep_along(1:3e6, list(iris)))
  dim(bigdata) # 450M rows, 5 columns
  lobstr::obj_size(bigdata) # 16.20 GB in RAM

  dbWriteTable(con, "straight_from_memory", bigdata)
}


bigdata <- tbl(con, "straight_from_memory") %>% collect()

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.