Difference between .RData file created in Windows and Linux

AkhileshPandey · August 19, 2020, 4:17pm

Hi,

I am trying save a dataframe using the below code:

save(df_Name, file = paste0(folderPath, 'df_Name.RData'))

My dataframe has approx. 1,200,000 rows and 60 columns. The size of .RData file created for the same dataframe with the above code in is different in Windows and Linux. While in Windows, the size of file is approx. 229 MB, in Linux it is 2 GB.

What is the potential reason for this? Is there any workaround for me to reduce the size of files in Linux?

grosscol · August 19, 2020, 4:50pm

That is interesting!

Is the data the same on both systems?

Size in Memory

On both systems does object.size( my_data_frame ) return approximately the same value?

Tidyverse also has object_size() from the pryr package that might return a different result as it has additional considerations, but for simple data frames I would expect it to be the same as the object.size result.

Content

Run a digest on the columns of the data set under both systems. Are they the same?

library(digest)
df <- data.frame(foo=c(1,2,3,4),
                 bar=c(5,6,7,8),
                 baz=c(9,9,9,9))

col_digests <- sapply(df, digest)

Are the digests the same before and after round trip?

After you round trip the data (serialize to disk and read back to R), is the data the same on both systems? Re-run object.size and digest as above. Do the values match the before & after on the same system? Do they match between the systems?

martin.R · August 19, 2020, 7:23pm

I very vaguely remember having a similar issue some time ago.

I think I ended up explicitly specifying the compress parameter. Check ?save for all options.

system · September 9, 2020, 7:23pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.