Basically, we got to select the data from the most recent file.
However file name has few discrepancy, and hence using string and rebus to clean them.
But can we use this info to select the most recent file name ?
Please find the simplified reprex :
library(tidyverse)
library(rebus)
myfiles <- tribble(
~files,~last_modified,
"file_2014_01.csv", "2019-07-17T14:00:20.000Z",
"file_2014_01 ", "2019-07-17T14:00:21.000Z",
"file_2014_01.csv", "2019-07-17T13:59:36.000Z",
"file_2014_01fdn.csv", "2019-07-17T14:00:23.000Z",
"file_2014_01.csv", "2019-07-17T14:00:11.000Z",
"file_2014_01.csv", "2019-07-17T14:00:27.000Z", # Most recent
"äsdfile_2014_03.csv", "2019-06-17T14:00:23.000Z",
"qwerfile_2014_03 ", "2019-07-15T14:00:21.000Z",
"file_2014_03.csv", "2019-01-17T13:59:36.000Z",
"bfffile_2014_03fdn.csv", "2019-06-17T14:00:32.000Z",
"cvfile_2014_03.csv", "2019-07-14T14:00:11.000Z",
"uufile_2014_03.csv", "2019-2-17T15:00:23.000Z" # Most recent
)
# Select same months
to_group <- myfiles %>% select(files) %>% unlist() %>%
str_extract(pattern = one_or_more(DGT) %R% ANY_CHAR %R%
one_or_more(DGT))
# number of months to choose from
to_group %>% unique()
# How can we use this info to select the file from the myfiles ?