Importing multiple csv files into RStudio while skipping certain rows in all files

AMM22 · November 4, 2022, 6:26pm

Hi,

I'm having an issue regarding importing multiple csv files at once into RStudio. I have about 400 csv files that I need to be imported into RStudio, and would like a way to do it all at once instead of one by one. Furthermore, all csv files have 2 rows at the beginning that are useless and mess up importing of the files if they're not skipped. Headers start on the 3rd row, rows 4-40 are actual numerical data, and the last row (41st) is also useless. How would I be able to import all the csv files while applying the above parameters to all the files?

Thanks in advance

bisulcsm · November 4, 2022, 6:58pm

Because I don't have a good example to work from, I can't give a 100% example, but I would use a combination of tidyverse and fs. You can use dir_ls from the fs package to list all the files to import, then pipe that into read_csv.

library(fs)
library(tidyverse)

dir_ls(dir_path, recurse = TRUE) %>%
     read_csv(skip = 2, n_max = 40)

### OR

dir_ls(dir_path, recurse = TRUE) %>%
     map_dfr(
          ~read_csv(.x, skip = 2, n_max = 40)
     )

The skip and n_max arguments in read_csv can really help in this case. You will have to verify if n_max = 40 is the right value, or if it needs to be 38. I wasn't able to test it. Not sure how it works in conjunction with skip.

dir_ls will return a list of file paths.

system · December 16, 2022, 6:59pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.