I have a filename that contain many files of this type
filenames <- list.files(path= "~/GitHub/emmsa/input/prueba",
full.names = TRUE, pattern = "*.RDS", recursive = TRUE)
filenames
"C:/Users/gigar/Documents/GitHub/emmsa/input/test/prices_01_01_2014.RDS"
"C:/Users/gigar/Documents/GitHub/emmsa/input/test/prices_02_01_2014.RDS"
"C:/Users/gigar/Documents/GitHub/emmsa/input/test/prices_03_01_2014.RDS"
sites <- str_extract(filenames, ("[\\d{4}_\\d{2}_\\d{2}]+"))
[1] "_01_01_2014" "_02_01_2014" "_03_01_2014" "_04_01_2014" "_05_01_2014" "_06_01_2014"
I want to apply this formula that is in this https://stackoverflow.com/questions/46299777/add-filename-column-to-table-as-multiple-files-are-read-and-bound , for which I need to extract the date and convert it into a date column for each file. I want you to help me improve this code that I have been stuck with these results
filenames <- list.files(path, full.names = TRUE, pattern = fileptrn, recursive = TRUE)
sites <- str_extract(filenames, "[A-Z]{2}-[A-Za-z0-9]{3}") # same length as filenames
Pass chunk of filename to column name
I am stuck in the part of using regular expressions to be able to develop the code.
my goal is to get the date from the file name and add it as a date column to every file i have.
I would like you to support me in this regard, I know I'm close, but I can't find the solution. Thanks in advance
Hi, are you after something like this? I have created fake data but it would apply to your problem.
library(tidyverse)
# fake files
write_csv(iris[1:5,], "files_test/iris_01_04_2022.csv")
write_csv(iris[6:10,], "files_test/iris_02_04_2022.csv")
filenames <- fs::dir_ls("files_test") %>%
set_names()
map_df(filenames, ~read_csv(.x), .id = "filename") %>%
mutate(date = str_extract(filename, "\\d{2}_\\d{2}_\\d{4}"),
date = as.Date(date, format = "%d_%m_%Y"))
# A tibble: 10 × 7
filename Sepal.Length Sepal.Width Petal.Length Petal.Width Species date
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <date>
1 files_test/iris_01_04_2022.csv 5.1 3.5 1.4 0.2 setosa 2022-04-01
2 files_test/iris_01_04_2022.csv 4.9 3 1.4 0.2 setosa 2022-04-01
3 files_test/iris_01_04_2022.csv 4.7 3.2 1.3 0.2 setosa 2022-04-01
4 files_test/iris_01_04_2022.csv 4.6 3.1 1.5 0.2 setosa 2022-04-01
5 files_test/iris_01_04_2022.csv 5 3.6 1.4 0.2 setosa 2022-04-01
6 files_test/iris_02_04_2022.csv 5.4 3.9 1.7 0.4 setosa 2022-04-02
7 files_test/iris_02_04_2022.csv 4.6 3.4 1.4 0.3 setosa 2022-04-02
8 files_test/iris_02_04_2022.csv 5 3.4 1.5 0.2 setosa 2022-04-02
9 files_test/iris_02_04_2022.csv 4.4 2.9 1.4 0.2 setosa 2022-04-02
10 files_test/iris_02_04_2022.csv 4.9 3.1 1.5 0.1 setosa 2022-04-02
williaml:
fs::dir_ls
Thank you very much for your answer @William Lay but I have some observation... the "filename" column (path) should not be in the final result of the table, only the date variable should appear, I have tried to eliminate the "filename" column but the code does not work completely. I would like you to please help me correct this observation.
finally, the function fs::dir_ls doesn't work for me that I use ".rds" files
The .rds files shouldn't be a problem.
library(tidyverse)
# fake files
write_rds(iris[1:3,], "files_test/iris_01_04_2022.rds")
write_rds(iris[4:6,], "files_test/iris_02_04_2022.rds")
filenames <- fs::dir_ls("files_test") %>%
set_names()
new_file <- map_df(filenames, ~read_rds(.x), .id = "filename") %>%
mutate(date = str_extract(filename, "\\d{2}_\\d{2}_\\d{4}"),
date = as.Date(date, format = "%d_%m_%Y")) %>%
select(-filename)
# output --------------------------
> new_file
Sepal.Length Sepal.Width Petal.Length Petal.Width Species date
1 5.1 3.5 1.4 0.2 setosa 2022-04-01
2 4.9 3.0 1.4 0.2 setosa 2022-04-01
3 4.7 3.2 1.3 0.2 setosa 2022-04-01
4 4.6 3.1 1.5 0.2 setosa 2022-04-02
5 5.0 3.6 1.4 0.2 setosa 2022-04-02
6 5.4 3.9 1.7 0.4 setosa 2022-04-02
1 Like
thank you very much, it is the code that I was looking for @williaml
1 Like
thanks, its ok. Gino Garibotto