I have many .RDS
files in "./data/source1"
. The files contain the same variables, just the date varies. I would like to use bind_rows()
to append the data sets. Rather than appending the files one by one, how could I use bind_rows()
more efficiently?
I don't know the structure of your .RDS files, but I would probably do this in three steps. This approach adds in the filename in a column based on what's read in so you know which file the data came from.
- Create a vector of filenames with full paths, with the
here
package to specify the root directory.
library("here")
library("readr")
library("purrr")
library("fs")
dir_list <- list.files(here("data/source1"),
pattern = "*.RDS", full.names = TRUE)
- Name the vector using just the filename without the extension.
names(dir_list) <- path_ext_remove(list.files(here("data/source1"),
pattern = "*.RDS"))
- Read in the files into a single list and combine them.
file_list <- map(dir_list, read_rds)
files_df <- bind_rows(file_list, .id = "source")
These last 2 can also be shortened as
files_df <- map_dfr(dir_list, read_rds, .id = "source")
Thanks a lot for the detailed instructions! This works perfectly; just for reference: one comma should be deleted in step 2.
I just found out that one of the variables has varying data structure -- character and numeric. Is there any way I can read all the variables as characters?
There's no way to specify the data type for read_rds like you can for read_csv. In the case of combining a numeric class, and character class, the result will become a character class. To make all the numeric variables into character classes, you can alter the final tibble files_df with
files_df <- files_df %>%
mutate(across(where(is.numeric), as.character))
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.