Unir hojas de calculo de Excel en un solo data frame que tienen encabezado

Tengo un libro de Excel formato .XLSX, el cual contiene 25 hojas de calculo (Inicio,1,2,3,4,5,6,7,8,9,10,11,12, 1D, 2D, 3D, 4D, 5D, 6D, 7D, 8D, 9D, 10D, 11D, 12D)

Yo deseo consolidar las siguientes hojas de calculo: 1D, 2D, 3D, 4D, 5D, 6D, 7D, 8D, 9D, 10D, 11D, 12D en un solo data frame

¿Cómo lo hago?

Adjunto enlace del archivo Excel:

This should get you started. It reads A4:BD289 from each sheet and combines them in one data frame. You can then set the column names to whatever you want.

library(openxlsx)
library(purrr)
library(dplyr)
FileName <- "~/R/Play/BDD_esquema_regular_2015.xlsx"
SheetNames <- paste0(1:12, "D")
AllSheets <- map(SheetNames, function(Nm) read.xlsx(FileName, sheet = Nm, 
                                                    colNames = FALSE, 
                                                    rows = 4:289, cols = 1:56))
AllData <- bind_rows(AllSheets)

Gracias por tu respuesta,
Una consulta: Si tengo hojas de calculo con diferentes números de filas, ¿cómo haría?
Ejemplo:
Hoja 1D tenga 500 filas
Hoja 2D tiene 100 filas

I used the arguments rows = 4:289, cols = 1:56 to solve two problems. First, the header data in rows 1:3 have a complex layout that does not translate well to a data frame. Second, sheet 9D has some extra data beyond column BD.
Instead of specifying the exact rows to read, you can use the startRow argument to skip the first three rows. The read.xlsx() function will then read all of the rows starting with row 4

library(openxlsx)
library(purrr)
library(dplyr)
FileName <- "~/R/Play/BDD_esquema_regular_2015.xlsx"
SheetNames <- paste0(1:12, "D")
AllSheets <- map(SheetNames, function(Nm) read.xlsx(FileName, sheet = Nm, 
                                                    colNames = FALSE, 
                                                    startRow = 4, cols = 1:56))
AllData <- bind_rows(AllSheets)
1 Like

Tal vez me puedes explicar de donde sacas Nm?
En la parte que dice function(Nm) :pleading_face:

Nm is the name of the argument in the function I wrote to get data from one sheet of the file. Here is a modified version of my code where I moved the definition of the function outside of the map() function. I define a function named GetData that takes one argument named Nm. Nm will be used as the value of the sheet argument in read.xlsx(). In the code below, I added an extra step where GetData() is used to read the sheet named 1D. That step serves no purpose except to illustrate the use of GetData().
In the line AllSheets <- map(SheetNames, GetData), the map() function steps through the SheetNames vector and runs GetData() with each value in SheetNames. First it runs GetData(Nm = "1D"), then GetData(Nm = "2D"), etc.
The result of running GetData(Nm = "1D") is
read.xlsx(FileName, sheet = "1D", colNames = FALSE, startRow = 4, cols = 1:56)

library(openxlsx)
library(purrr)
library(dplyr)
FileName <- "~/R/Play/BDD_esquema_regular_2015.xlsx"
SheetNames <- paste0(1:12, "D")

#Define a function to get data from one sheet
GetData <- function(Nm) read.xlsx(FileName, sheet = Nm, 
                                  colNames = FALSE, 
                                  startRow = 4, cols = 1:56)

#Get data from one sheet, just as an example
Sheet1D <- GetData(Nm = "1D")

#Get data from all the sheets
AllSheets <- map(SheetNames, GetData)
AllData <- bind_rows(AllSheets)
1 Like

Una última consulta:
Para poder unir las hojas, tube que utilizar rbind.fill ya que con rbind_rows me daba un mensaje de error, entonces quedó así:
AllData <- rbind.fill(AllSheets)

Pero ahora quiero agregar una columna con el nombre de cada hoja de excel, para poder identificar la fuente de donde se extrajo el dato?

I do not see an error running the following code and using the data you provided. I changed the code to add a column with the sheet name.

library(openxlsx)
library(purrr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
FileName <- "~/R/Play/BDD_esquema_regular_2015.xlsx"
SheetNames <- paste0(1:12, "D")

#Define a function to get data from one sheet
GetData <- function(Nm) read.xlsx(FileName, sheet = Nm, 
                                  colNames = FALSE, 
                                  startRow = 4, cols = 1:56)

#Get data from all the sheets
AllSheets <- map(SheetNames, GetData)
names(AllSheets) <- SheetNames
AllData <- bind_rows(AllSheets, .id = "Hoja")
AllData[1:5, 1:5]
#>   Hoja      X1       X2    X3     X4
#> 1   1D Bolivar Guaranda 02D01 000125
#> 2   1D Bolivar Guaranda 02D01 000154
#> 3   1D Bolivar Guaranda 02D01 000130
#> 4   1D Bolivar Guaranda 02D01 000126
#> 5   1D Bolivar Guaranda 02D01 000120

Created on 2023-11-22 with reprex v2.0.2

Lo que sucede es que ahora estoy uniendo las hojas de la 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 pero hay un dato que no es númerico y ese no me permite usar el siguiente código:

AllData <- bind_rows(AllSheets, .id = "Hoja")
Error in bind_rows():
! Can't combine 1$X111 and 9$X111 .
Run rlang::last_trace() to see where the error occurred.

Por eso intenté cambiar por el AllData <- rbind.fill(AllSheets) pero este no me permite agregar una columna con el nombre de la hoja de donde se extrae el dato

Looking at the sheet named 1, The startRow argument of read.xlsx() in GetData() should be set to 8. This code does not give me an error.

library(openxlsx)
library(purrr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
FileName <- "~/R/Play/BDD_esquema_regular_2015.xlsx"
SheetNames <- as.character(1:12)
#Define a function to get data from one sheet
GetData <- function(Nm) read.xlsx(FileName, sheet = Nm, 
                                  colNames = FALSE, 
                                  startRow = 8, cols = 1:56)

#Get data from all the sheets
AllSheets <- map(SheetNames, GetData)
names(AllSheets) <- SheetNames
AllData <- bind_rows(AllSheets, .id = "Hoja")
AllData[1:5, 1:7]
#>   Hoja      X1       X2    X3     X4             X5  X6
#> 1    1 Bolivar Guaranda 02D01 000125     4 ESQUINAS  37
#> 2    1 Bolivar Guaranda 02D01 000154     CACHISAGUA  15
#> 3    1 Bolivar Guaranda 02D01 000130 CORDERO CRESPO 104
#> 4    1 Bolivar Guaranda 02D01 000126   EL RAYO BAJO   3
#> 5    1 Bolivar Guaranda 02D01 000120   FACUNDO VELA  14

Created on 2023-11-22 with reprex v2.0.2

Hola buen día,

Gracias por tu respuesta, la valoro mucho!
En el argumento cols debe ser = 1:132, cambiando este dato es que me da error :pleading_face:
GetData <- function(Nm) read.xlsx(FileName, sheet = Nm,
colNames = FALSE,
startRow = 8, cols = 1:132)

El error que se genera es porque hay una columna que tienen un dato que no es número, te agradezco tu gran ayuda

This version of the code uses as.numeric() to coerce any text in columns 6:132 into NA. It throws one warning when it coerces the text in sheet 9 to NA.

library(openxlsx)
library(purrr)
library(dplyr)

FileName <- "~/R/Play/BDD_esquema_regular_2015.xlsx"
SheetNames <- as.character(1:12)
#Define a function to get data from one sheet
GetData <- function(Nm) {
  tmp <- read.xlsx(FileName, sheet = Nm, 
                                  colNames = TRUE, 
                                  startRow = 7, cols = 1:132)
  tmp |> mutate(across(.cols = 6:132, as.numeric))
}
  

#Get data from all the sheets
AllSheets <- map(SheetNames, GetData)
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(.cols = 6:132, as.numeric)`.
#> Caused by warning:
#> ! NAs introduced by coercion
names(AllSheets) <- SheetNames
AllData <- bind_rows(AllSheets, .id = "Hoja")
AllData[1:5, 1:7]
#>   Hoja       A        B     C      D              E   1
#> 1    1 Bolivar Guaranda 02D01 000125     4 ESQUINAS  37
#> 2    1 Bolivar Guaranda 02D01 000154     CACHISAGUA  15
#> 3    1 Bolivar Guaranda 02D01 000130 CORDERO CRESPO 104
#> 4    1 Bolivar Guaranda 02D01 000126   EL RAYO BAJO   3
#> 5    1 Bolivar Guaranda 02D01 000120   FACUNDO VELA  14

Created on 2023-11-23 with reprex v2.0.2

1 Like

Me sigue apareciendo este error, no entiendo porque no convierte a número la variable 106 de la lista 9

[1] "D:\\richard.narvaez\\Documents\\Esquema Regular\\BD\\Esquema_regular_2015.xlsx"
> FileName <- "D:\\richard.narvaez\\Documents\\Esquema Regular\\BD\\Esquema_regular_2015.xlsx"
> SheetNames <- as.character(1:12)
> #Define a function to get data from one sheet
> GetData <- function(Nm) {
+   tmp <- read.xlsx(FileName, sheet = Nm, 
+                    colNames = TRUE, 
+                    startRow = 7, cols = 1:132)
+   tmp |> mutate(across(.cols = 6:132, as.numeric))
+ }
> #Get data from all the sheets
> AllSheets <- map(SheetNames, GetData)
> #> Warning: There was 1 warning in `mutate()`.
> #> ℹ In argument: `across(.cols = 6:132, as.numeric)`.
> #> Caused by warning:
> #> ! NAs introduced by coercion
> names(AllSheets) <- SheetNames
> AllData <- bind_rows(AllSheets, .id = "Hoja")
Error in `bind_rows()`:
! Can't combine `1$106` <double> and `9$106` <character>.
Run `rlang::last_trace()` to see where the error occurred.

Try restarting your R session.

Sí, en realidad me tocó apagar la computadora. Ahora si funciona!!
Muchas gracias!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.