Selecting latest Excel file from a folder

Hello,

If I have a bunch of Excel sheets in a folder and all contain same data except it gets updated every month with same name ending with month & year.

Instead of reading exact file with its name using

read_excel("\Folder\file_202305.xlsx"). 

Is there a tidyverse way that file ending with latest year & month gets picked up without specifying the full file name?

It would also be great to know, if we had different names all together, but just based on latest updated date, we can read the latest file regardless of its name.

Thanks for the help!

You can get a data frame of information about the files in a directory like this:

Files <- list.files(pattern = "xlsx$")
INFO <- file.info(Files)

The data frame INFO has a column named mtime which stores the last time each file was modified. You could find the most recent file with that.

Thanks @FJCC !
This gives me the list of all files. I am interested in extracting only the latest file.

Example, folder contains data_202304.xlsx and data_202305.xlsx.

List file will list both the excel files. I would like to read only data_202305 as that is the latest version.

If I write the following, it lists the latest version.

files <- list.files("Folder Name/", pattern = "202305.xlsx$")

But If I give assign a variable like shown below, then it doesn't pick up. The version will change each month. Thus, having the variable would help.

latest_data <- 202305
files <- list.files("Folder Name/", pattern = "latest_data.xlsx$")

Plus, how can we read the file once picked up from the list here.

Thanks again!

You can get the name of the file with the most recent date of modification like this:

Files <- list.files(pattern = "xlsx$")
INFO <- file.info(Files)
MaxTime <- max(INFO$mtime)
LastFile <- rownames(INFO[INFO$mtime == MaxTime,])

If your file names are as systematic as the ones in your example, you can use the max() function to get the file with the largest date value.

Files <- list.files(pattern = "xlsx$")
MaxFile <- max(Files)

Once you have the file name you can use a function from the readxl package to load the data.

library(readxl)
ExcelData <- read_xlsx(LastFile)
#Or
ExcelData <- read_xlsx(MaxFile)

Thanks again @FJCC !

This is very helpful!

I am still struggling to read these files though.

library(readxl)
# Option2 with systematic file names
Files <- list.files("Excel Files/", pattern = "xlsx$")
MaxFile <- max(Files) 
# We do get the latest file here. But while reading the file like below, we get the error
ExcelData <- read_xlsx(MaxFile)
# Error: `path` does not exist: ‘Test Data_202305.xlsx’
# Path is already listed above as "Excel Files/"

## Option1 - File with most recent date of modification

Files <- list.files("Excel Files/", pattern = "xlsx$")
INFO <- file.info(Files)
# INFO shows blank columns for each excel file. So, mtime is NA

MaxTime <- max(INFO$mtime)
# MaxTime is thus, blank

LastFile <- rownames(INFO[INFO$mtime == MaxTime,])
# LastFile shows "NA", "NA.1" because of blank columns in INFO

ExcelData <- read_xlsx(LastFile)
# Same error of Path doesn't exist as in Option2

The read_xlsx() function is looking for the file in the working directory and not in Excel_Files/. Setting the path argument in list.files() does not set the path in other functions. To construct the correct file name, you can use code like

FullFileName <- paste0("Excel Files/", LastFile)
ExcelData <- read_xlsx(FullFileName)

Thanks again @FJCC !

I still have the same error on path. Path works for list and picks up the file. But struggling to read that file.

Does entering the file path and name manually work?

ExcelData <- read_xlsx("Excel Files/data_202305.xlsx")

If not, find a manually entered file path and name that does work so you know what you have to build up using the paste0() function.

Yes, It does work manually.

Thanks again!

Apparently, the previous suggestion works now. For some reason it was not working last time.

Thanks a bunch @FJCC !!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.