This question is because every stata Filesize is near 200 Mb...Imagine your RAM joining 200 MB files.
Using Stata I just ran a loop to read every dta, and I was never storing the previous one.
What would you do?
I hope I made myself clear.
As always, thanks for your time, interest, and patience
Then you should ask yourself, if -- potentially -- you can work on merged dataset itself.
Why do not apply the same approach here? Work on the files subsequently, processing them one after one? Or perhaps you can get a subset of data (particular columns) from the source files lowering the demand of resources? Or compute required values in loop for i in statas and rbind them in another dataframe? Or extract the stata files, wrote them to any text format you like and merge them outside using cat or awk?
Hello,
Using the same focus of declaring a for i=1 to ... It doesn't seem a purrr-base approach.
I don't know how just importing only the columns I need with purrr.
Maybe I can merge some selected variables. Maybe my RAM will resist...maybe.
for (i in 1:100) {
data<-read_dta(list_files[i])
tabx<-data %>%
group_by(var1,var2) %>%
summarise(var_y=sum(var3)/1000)
if (i==1) {
outputs=tabx
} else {
output=bind_rows(output, tabx)
}
}
I think It works ok. I don't know if It is inefficient.
The other method was using assign and then declaring "mget" of every tabulate.
Any other suggestion will be appreciated.
Thanks, community