Tengo un libro de Excel formato .XLSX, el cual contiene 25 hojas de calculo (Inicio,1,2,3,4,5,6,7,8,9,10,11,12, 1D, 2D, 3D, 4D, 5D, 6D, 7D, 8D, 9D, 10D, 11D, 12D)
Yo deseo consolidar las siguientes hojas de calculo: 1D, 2D, 3D, 4D, 5D, 6D, 7D, 8D, 9D, 10D, 11D, 12D en un solo data frame
This should get you started. It reads A4:BD289 from each sheet and combines them in one data frame. You can then set the column names to whatever you want.
Gracias por tu respuesta,
Una consulta: Si tengo hojas de calculo con diferentes números de filas, ¿cómo haría?
Ejemplo:
Hoja 1D tenga 500 filas
Hoja 2D tiene 100 filas
I used the arguments rows = 4:289, cols = 1:56 to solve two problems. First, the header data in rows 1:3 have a complex layout that does not translate well to a data frame. Second, sheet 9D has some extra data beyond column BD.
Instead of specifying the exact rows to read, you can use the startRow argument to skip the first three rows. The read.xlsx() function will then read all of the rows starting with row 4
Nm is the name of the argument in the function I wrote to get data from one sheet of the file. Here is a modified version of my code where I moved the definition of the function outside of the map() function. I define a function named GetData that takes one argument named Nm. Nm will be used as the value of the sheet argument in read.xlsx(). In the code below, I added an extra step where GetData() is used to read the sheet named 1D. That step serves no purpose except to illustrate the use of GetData().
In the line AllSheets <- map(SheetNames, GetData), the map() function steps through the SheetNames vector and runs GetData() with each value in SheetNames. First it runs GetData(Nm = "1D"), then GetData(Nm = "2D"), etc.
The result of running GetData(Nm = "1D") is read.xlsx(FileName, sheet = "1D", colNames = FALSE, startRow = 4, cols = 1:56)
library(openxlsx)
library(purrr)
library(dplyr)
FileName <- "~/R/Play/BDD_esquema_regular_2015.xlsx"
SheetNames <- paste0(1:12, "D")
#Define a function to get data from one sheet
GetData <- function(Nm) read.xlsx(FileName, sheet = Nm,
colNames = FALSE,
startRow = 4, cols = 1:56)
#Get data from one sheet, just as an example
Sheet1D <- GetData(Nm = "1D")
#Get data from all the sheets
AllSheets <- map(SheetNames, GetData)
AllData <- bind_rows(AllSheets)
Una última consulta:
Para poder unir las hojas, tube que utilizar rbind.fill ya que con rbind_rows me daba un mensaje de error, entonces quedó así:
AllData <- rbind.fill(AllSheets)
Pero ahora quiero agregar una columna con el nombre de cada hoja de excel, para poder identificar la fuente de donde se extrajo el dato?
Lo que sucede es que ahora estoy uniendo las hojas de la 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 pero hay un dato que no es númerico y ese no me permite usar el siguiente código:
AllData <- bind_rows(AllSheets, .id = "Hoja")
Error in bind_rows():
! Can't combine 1$X111 and 9$X111 .
Run rlang::last_trace() to see where the error occurred.
Por eso intenté cambiar por el AllData <- rbind.fill(AllSheets) pero este no me permite agregar una columna con el nombre de la hoja de donde se extrae el dato
Gracias por tu respuesta, la valoro mucho!
En el argumento cols debe ser = 1:132, cambiando este dato es que me da error
GetData <- function(Nm) read.xlsx(FileName, sheet = Nm,
colNames = FALSE,
startRow = 8, cols = 1:132)
El error que se genera es porque hay una columna que tienen un dato que no es número, te agradezco tu gran ayuda
This version of the code uses as.numeric() to coerce any text in columns 6:132 into NA. It throws one warning when it coerces the text in sheet 9 to NA.
library(openxlsx)
library(purrr)
library(dplyr)
FileName <- "~/R/Play/BDD_esquema_regular_2015.xlsx"
SheetNames <- as.character(1:12)
#Define a function to get data from one sheet
GetData <- function(Nm) {
tmp <- read.xlsx(FileName, sheet = Nm,
colNames = TRUE,
startRow = 7, cols = 1:132)
tmp |> mutate(across(.cols = 6:132, as.numeric))
}
#Get data from all the sheets
AllSheets <- map(SheetNames, GetData)
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `across(.cols = 6:132, as.numeric)`.
#> Caused by warning:
#> ! NAs introduced by coercion
names(AllSheets) <- SheetNames
AllData <- bind_rows(AllSheets, .id = "Hoja")
AllData[1:5, 1:7]
#> Hoja A B C D E 1
#> 1 1 Bolivar Guaranda 02D01 000125 4 ESQUINAS 37
#> 2 1 Bolivar Guaranda 02D01 000154 CACHISAGUA 15
#> 3 1 Bolivar Guaranda 02D01 000130 CORDERO CRESPO 104
#> 4 1 Bolivar Guaranda 02D01 000126 EL RAYO BAJO 3
#> 5 1 Bolivar Guaranda 02D01 000120 FACUNDO VELA 14