bind_rows(): multiple RDS files

budugulo · August 26, 2020, 7:32pm

I have many .RDS files in "./data/source1" . The files contain the same variables, just the date varies. I would like to use bind_rows() to append the data sets. Rather than appending the files one by one, how could I use bind_rows() more efficiently?

jrmuirhead · August 26, 2020, 9:36pm

I don't know the structure of your .RDS files, but I would probably do this in three steps. This approach adds in the filename in a column based on what's read in so you know which file the data came from.

Create a vector of filenames with full paths, with the here package to specify the root directory.

library("here")
library("readr")
library("purrr")
library("fs")

dir_list <- list.files(here("data/source1"),
  pattern = "*.RDS", full.names = TRUE)

Name the vector using just the filename without the extension.

names(dir_list) <- path_ext_remove(list.files(here("data/source1"), 
  pattern = "*.RDS"))

Read in the files into a single list and combine them.

file_list <- map(dir_list, read_rds)
files_df <- bind_rows(file_list, .id = "source")

These last 2 can also be shortened as

 files_df <- map_dfr(dir_list, read_rds, .id = "source")

budugulo · August 26, 2020, 11:15pm

Thanks a lot for the detailed instructions! This works perfectly; just for reference: one comma should be deleted in step 2.

I just found out that one of the variables has varying data structure -- character and numeric. Is there any way I can read all the variables as characters?

jrmuirhead · August 26, 2020, 11:52pm

There's no way to specify the data type for read_rds like you can for read_csv. In the case of combining a numeric class, and character class, the result will become a character class. To make all the numeric variables into character classes, you can alter the final tibble files_df with

files_df <- files_df %>%
  mutate(across(where(is.numeric), as.character))

system · September 2, 2020, 11:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.