Pass chunk of filename to column name in a loop

Carlos_Gino · July 6, 2022, 12:34am

I have a filename that contain many files of this type

filenames <- list.files(path= "~/GitHub/emmsa/input/prueba",
                        full.names = TRUE, pattern = "*.RDS", recursive = TRUE)
filenames

"C:/Users/gigar/Documents/GitHub/emmsa/input/test/prices_01_01_2014.RDS"
"C:/Users/gigar/Documents/GitHub/emmsa/input/test/prices_02_01_2014.RDS"
"C:/Users/gigar/Documents/GitHub/emmsa/input/test/prices_03_01_2014.RDS"

sites <-  str_extract(filenames, ("[\\d{4}_\\d{2}_\\d{2}]+"))

[1] "_01_01_2014" "_02_01_2014" "_03_01_2014" "_04_01_2014" "_05_01_2014" "_06_01_2014"

I want to apply this formula that is in this https://stackoverflow.com/questions/46299777/add-filename-column-to-table-as-multiple-files-are-read-and-bound , for which I need to extract the date and convert it into a date column for each file. I want you to help me improve this code that I have been stuck with these results

filenames <- list.files(path, full.names = TRUE, pattern = fileptrn, recursive = TRUE)
sites <- str_extract(filenames, "[A-Z]{2}-[A-Za-z0-9]{3}") # same length as filenames

Pass chunk of filename to column name

I am stuck in the part of using regular expressions to be able to develop the code.
my goal is to get the date from the file name and add it as a date column to every file i have.
I would like you to support me in this regard, I know I'm close, but I can't find the solution. Thanks in advance

williaml · July 6, 2022, 1:46am

Hi, are you after something like this? I have created fake data but it would apply to your problem.

library(tidyverse)

# fake files
write_csv(iris[1:5,], "files_test/iris_01_04_2022.csv")
write_csv(iris[6:10,], "files_test/iris_02_04_2022.csv")


filenames <- fs::dir_ls("files_test") %>% 
  set_names()

map_df(filenames, ~read_csv(.x), .id = "filename") %>% 
  mutate(date = str_extract(filename, "\\d{2}_\\d{2}_\\d{4}"),
         date = as.Date(date, format = "%d_%m_%Y"))

# A tibble: 10 × 7
   filename                       Sepal.Length Sepal.Width Petal.Length Petal.Width Species date      
   <chr>                                 <dbl>       <dbl>        <dbl>       <dbl> <chr>   <date>    
 1 files_test/iris_01_04_2022.csv          5.1         3.5          1.4         0.2 setosa  2022-04-01
 2 files_test/iris_01_04_2022.csv          4.9         3            1.4         0.2 setosa  2022-04-01
 3 files_test/iris_01_04_2022.csv          4.7         3.2          1.3         0.2 setosa  2022-04-01
 4 files_test/iris_01_04_2022.csv          4.6         3.1          1.5         0.2 setosa  2022-04-01
 5 files_test/iris_01_04_2022.csv          5           3.6          1.4         0.2 setosa  2022-04-01
 6 files_test/iris_02_04_2022.csv          5.4         3.9          1.7         0.4 setosa  2022-04-02
 7 files_test/iris_02_04_2022.csv          4.6         3.4          1.4         0.3 setosa  2022-04-02
 8 files_test/iris_02_04_2022.csv          5           3.4          1.5         0.2 setosa  2022-04-02
 9 files_test/iris_02_04_2022.csv          4.4         2.9          1.4         0.2 setosa  2022-04-02
10 files_test/iris_02_04_2022.csv          4.9         3.1          1.5         0.1 setosa  2022-04-02

Carlos_Gino · July 6, 2022, 7:54pm

Thank you very much for your answer @William Lay but I have some observation... the "filename" column (path) should not be in the final result of the table, only the date variable should appear, I have tried to eliminate the "filename" column but the code does not work completely. I would like you to please help me correct this observation.
finally, the function fs::dir_ls doesn't work for me that I use ".rds" files

williaml · July 6, 2022, 8:57pm

The .rds files shouldn't be a problem.

library(tidyverse)

# fake files
write_rds(iris[1:3,], "files_test/iris_01_04_2022.rds")
write_rds(iris[4:6,], "files_test/iris_02_04_2022.rds")

filenames <- fs::dir_ls("files_test") %>% 
  set_names()

new_file <- map_df(filenames, ~read_rds(.x), .id = "filename") %>% 
  mutate(date = str_extract(filename, "\\d{2}_\\d{2}_\\d{4}"),
         date = as.Date(date, format = "%d_%m_%Y")) %>% 
  select(-filename)

# output --------------------------
> new_file
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species       date
1          5.1         3.5          1.4         0.2  setosa 2022-04-01
2          4.9         3.0          1.4         0.2  setosa 2022-04-01
3          4.7         3.2          1.3         0.2  setosa 2022-04-01
4          4.6         3.1          1.5         0.2  setosa 2022-04-02
5          5.0         3.6          1.4         0.2  setosa 2022-04-02
6          5.4         3.9          1.7         0.4  setosa 2022-04-02

Carlos_Gino · July 6, 2022, 9:37pm

thank you very much, it is the code that I was looking for @williaml

Carlos_Gino · July 7, 2022, 6:42pm

thanks, its ok. Gino Garibotto