iterate readxl over multiple files

I have ~ 80 .xls files that are generated from image analysis and all have the exact same columns and sheets (with varying numbers of rows). I want to be able to concatenate (bind_rows) one column from one specific sheet from a subset of the excels, but I am new to purrr and I'm struggling with making my function recognize the range and sheet arguments.

Link to example excel

Loading one excel gets the desired result: the first column from the correct sheet within the excel.

library(tidyverse)
library(readxl)
excels_dir <- "rab6_paper_excels/timecourse/saved" #path
test_one_excel <- read_xls(path = file.path(excels_dir, "MH01_Dox8hr_15Jan2019_R3D_D3D_cell1.xls"), sheet = "Area", range = cell_cols(1))
test_one_excel
#> # A tibble: 51 x 1
#>    Area               
#>    <chr>              
#>  1 Value              
#>  2 0.40608701109886169
#>  3 0.33521801233291626
#>  4 0.93611299991607666
#>  5 1.1068500280380249 
#>  6 0.25582000613212585
#>  7 0.64462101459503174
#>  8 0.58971697092056274
#>  9 0.98017400503158569
#> 10 0.53212398290634155
#> # ... with 41 more rows

Created on 2020-07-13 by the reprex package (v0.3.0)

But when I try to use map_dfr, readxl thinks there is only one sheet in the data.

library(tidyverse)
library(readxl)
excels_dir <- "rab6_paper_excels/timecourse/saved" #path
load_excels <- function(directory, pattern = "*.xls", sheet_sel = 1, ...) {
  files_list <- list.files(directory, pattern = {{ pattern }}, ignore.case = TRUE)
  files_list <- setNames(files_list, files_list)
  map_dfr(.x = files_list, ~ read_xls(path = file.path(excels_dir, .x), ...), sheet = sheet_sel, range = cell_cols(1), .id = "source_file")
}

test_fxn_mult <- load_excels(excels_dir, pattern = "*.xls", sheet_sel = 2) 
#> Error: Can't retrieve sheet in position 2, only 1 sheet(s) found.
test_fxn_mult
#> Error in eval(expr, envir, enclos): object 'test_fxn_mult' not found

Created on 2020-07-13 by the reprex package (v0.3.0)

The same error is returned if I use lapply.

I'm really at a loss here. How do I make this work?

can you manually select the second sheet (in the same manner as you are attempting inside the map) ?

 read_xls(path = file.path(excels_dir, "MH01_Dox8hr_15Jan2019_R3D_D3D_cell1.xls"), sheet = 2, range = cell_cols(1))

Yes, here sheet = "Area" and sheet = 2 are equivalent. My apologies, I forgot to make that consistent when making the reprex.

I would prefer to use sheet = "Area" rather than manually selecting the sheet by position, because the number/order of sheets generated by the image analysis software is not universal, but the sheet names are always consistent.

ARGH I found the issue - one of the excel files hadn't been properly exported, so it only had one sheet. Thanks for your help @nirgrahamuk and I'm sorry I wasted both of our time.

At least it works?

all's well that ends well :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.