how to combine multiple excel files having multiple worksheets into a single excel file with multiple worksheets using Rstudio

cizere · March 24, 2021, 3:03pm

I have multiple file names, for example, C1.xlsx; C2.xlsx; C3.xlsx; C4.xlsx, etc. where each file is having multiple worksheets such as C_1; C_2; C_3 and so... on i.e., worksheet names in all the files are same and the number of sheets are equal in all the files. Now, I need to combine all the worksheets having similar worksheet names from all the files. In addition, the column names in each worksheet are same. I'm learning how to code with R.

Can someone help me with this?

Thanks

cactusoxbird · March 24, 2021, 5:39pm

There's a few ways that you can do this, but I like to use the purrr package for tasks like this. Here's a couple of options for doing so.

First, I'd get a vector of file locations:

library(tidyverse)
library(readxl)

# Get the paths to each file:
files <- list.files(
 # Currently set to working directory location
 path = ".",
 # Look for Excel workbooks
 pattern = ".xlsx",
 # Include the full path to the file                  
 full.names = TRUE)

One option would then be to read in the data for one sheet at a time (i.e., all files, but same sheet from each file):

# Store the names of the sheets we'll be accessing, using one of the files
# as the template for the sheet names
all_sheets <- excel_sheets(files[1])

# .x is our vector of file paths. map_df gives us a single data frame/tibble as its output
# We're saying "for each file path..."
first_sheet_contents <- map_df(.x = files,
                               # "...read the xlsx file, but only for the the sheet with this name"
                               .f = ~ read_excel(path = .x,
                                                 sheet = all_sheets[1]))

# Here's an example of the output from multiple sheets condensed into one data frame 
first_sheet_contents 

# # A tibble: 4 x 2
#   first_col second_col
#      <dbl> <chr>     
# 1         1 a         
# 2         4 b         
# 3         8 a         
# 4         5 f

# And then you could do the same for the second sheet, etc...
second_sheet_contents <- map_df(.x = files,
                                .f = ~ read_excel(path = .x,
                                                  sheet = all_sheets[2]))

An alternative option might be to read all sheets from all files in one step. We'll get a list as our output. Within each list item will be a sublist. Main list items correspond to files, and their sub-items correspond to sheets in the file. For example:

all_file_contents <- map(.x = files,
                         .f = ~ {
                           
                           temp_file <- .x
                           
                           map(.x = excel_sheets(.x),
                               .f = ~ read_excel(path = temp_file,
                                                 sheet = .x) %>%
                                # Include sheet name. Not required,
                                # but might be helpful for checking
                                 mutate(sheet_name = .x))
                           
                         })

all_file_contents 

# [[1]] # This is file one...
# [[1]][[1]] 
# # A tibble: 2 x 3
#   first_col second_col sheet_name
#       <dbl> <chr>      <chr>     
# 1         1 a          C_1       
# 2         4 b          C_1       

# [[1]][[2]]
# # A tibble: 2 x 3
#   first_col second_col sheet_name
#       <dbl> <chr>      <chr>     
# 1         1 d          C_2       
# 2         6 b          C_2       


# [[2]] # This is file two...
# [[2]][[1]]
# # A tibble: 2 x 3
#   first_col second_col sheet_name
#       <dbl> <chr>      <chr>     
# 1         8 a          C_1       
# 2         5 f          C_1       

# [[2]][[2]]
# # A tibble: 2 x 3
#   first_col second_col sheet_name
#       <dbl> <chr>      <chr>     
# 1         5 a          C_2       
# 2         6 b          C_2

You can invert the list so that you have data from the same sheet IDs grouped together

transpose(all_file_contents) %>%
  map(.x = .,
      .f = ~ reduce(.x = .x, .f = bind_rows))

# [[1]]
# # A tibble: 4 x 3
#   first_col second_col sheet_name
#       <dbl> <chr>      <chr>     
# 1         1 a          C_1       
# 2         4 b          C_1       
# 3         8 a          C_1       
# 4         5 f          C_1       

# [[2]]
# # A tibble: 4 x 3
#   first_col second_col sheet_name
#       <dbl> <chr>      <chr>     
# 1         1 d          C_2       
# 2         6 b          C_2       
# 3         5 a          C_2       
# 4         6 b          C_2

OR, you could just read all the sheets in as a single data frame:

transpose(all_file_contents) %>%
  map_df(.x = .,
         .f = ~ .x)

# # A tibble: 8 x 3
#   first_col second_col sheet_name
#       <dbl> <chr>      <chr>     
# 1         1 a          C_1       
# 2         4 b          C_1       
# 3         8 a          C_1       
# 4         5 f          C_1       
# 5         1 d          C_2       
# 6         6 b          C_2       
# 7         5 a          C_2       
# 8         6 b          C_2

^{Created on 2021-03-24 by the reprex package (v1.0.0)}

cizere · March 25, 2021, 7:47am

Thank you so much. I will apply this on my data.

system · April 15, 2021, 7:47am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.