Is it possible to read/import specifcs variables from a RData file

jfca283 · April 5, 2023, 5:42am

Hello,
Sometime ago I asked this using a STATA file as source.
Stata can read the variables names and then import the selected ones.
For example, if the dta file has 500 columns and the variables SEX and AGE, I can import only those columns.
The two columns will be read and load on memory.
This is very fast when you are looping through big datasets.

Is It possible to perform the above task using a RData file?
I mean, importing only the specified columns/variables names in order to perform a loop?
I read the help file and It doesn't show that capability.

If I try to write a loop with a RData file, I'm always reading and importing all the variables.
Using select from dplyr doesn't improve the proc time, because I'm still force to read many variables that aren't important or relevant.

I hope I made myself clear. It's a strange question in some way.
Thanks for your time and interest.
Have a nice day.

martin.R · April 5, 2023, 9:31am

I would suggest not using .RData (or .rds) files for your needs.

Use csv files or arrow (Arrow R Package • Arrow R Package) if you want to read a subset of columns.

jfca283 · April 6, 2023, 1:28am

It's a good choice what You said.
But sometimes the DTA files comes with labels.
CSV are a poor source when You think about that.
I recently read that heaven can read some selected columns with cols option.
But RData have'nt implemented that option yet.
I joined 12 dta files, each one of 80 MB.
A RData with the dta files only amount to 70 MB.

martin.R · April 6, 2023, 8:28am

.RData is not a dataframe format, so is not suitable for your needs. The smaller size is because you have allowed it to be compressed.

I don't know Stata, but, yes, haven is there to support such formats, so use that package.

jfca283 · April 8, 2023, 1:54am

I just solved It using CSV or DTA files.
RData can't do the col_selec or select option.
data.table::fread or haven::read_dta can perform the task desired.
fread was the best option.
Thanks for your ideas.

system · April 15, 2023, 1:54am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.