I am running into the issue of reading several large .gz.csv using read_csv and quickly hitting a no space left error because the files are -- I suppose -- being temporarily decompressed on my /tmp folder.
Is there a way to change that behavior? For instance, decompress the files in the same directory as the gz.csv files? Does that make sense?
EDIT: I now realize this is indeed an issue of redirecting the temporary files to another folder than the default. Is there a way to do so in Rstudio?
The tempfile used is always created in the R session's temporary directory. You can use any of the shell environment variables TMPDIR, TMP and TEMP to control where the temporary directory is located, but this only has an effect when the R session first starts. You can't change the temporary directory within a running session.
You can use usethis::edit_r_environ() to open the proper R environment file where you can define one of the variables, e.g.
I think there is some interesting following up here. @jimhester is there a way to clean the tmpdir during the session?
Here is the idea. Doing something like list.files('mydir') %>% map(., ~read_csv(.x))
creates issues because when reading hundreds of different gz files, the tmp files accumulate until they reach the disk limit.
I know I can use gc() during the session to clean the environment of unused objects in memory, but is there the equivalent (callable in a loop) that cleans the tmp environment as well?
thanks!! I ll check it out. Something I quite dont understand is whether vroom is really fast, or this is just because it fakes the loading of the data until you need it. In other words, when the computation really happens then it is as good as readr and others. Am I missing something (I surely am).