Very high RAM usage while working with Arrow for R


I am working with big tables. I figured out I could use Arrow to keep my RAM relatively low and do all the calculation outside the memory.

I spent quite a bit of time reading through Arrow for R and watched a few youtube videos. However im still struggling to understand what is happening....

  1. open .parquet file using arrow::open_dataset()
  2. write dplyr-like code to join 4 tables
  3. use %>% compute() at the end of dplyr code
  4. RAM jumps from 100mb to 60GB

If I use dplyr to do the same thing (but not using Arrow), RAM jumps to 50GB in use.

Im refering to this RAM usage report in RStudio

Am I missing a step to lower RAM usage while using Arrow?


This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.