Hi everybody, I have almost 500 .xlsx files that I want to load to R but I need to make a lot of changes to them once they´re on R because there are a lot of rows and columns that I don´t need. At the end they must be unite with rbind()
to get one data.frame
.
One of these changes is to create a column with the date of the file. The thing is that the date is located in a certain cell datos[2,3]
so I take it into an object named fecha.dia
and transform it to get a date object. But it´s an object with just one element. What I do is create a new variable in the data.frame
call Fecha
and I put into it the object fecha.dia
so it fills the whole column with the date.
Other thing is that the .xlsx files don´t have the same number of columns, that´s why I turn it into a long format and then do the rbind()
.
When I run the code without the loop, just for one file, it worked.
This is a sample of one file, it' really a small sample because I can't copy all the columns, I think it's to big.
datos <- tibble::tribble(
~DESPACHO.ECONOMICO.DIARIO.PROGRAMADO, ~`HORA.PUBLICACION.WEB:`, ~`14:25`,
NA, NA, NA,
NA, NA, "miercoles, 1 de enero de 2020",
"CCSIH", "PUCAH", "NORMH",
NA, NA, NA,
NA, NA, NA,
"991.9", "20.0", "49.6",
"926.7", "20.0", "49.6",
"749.5", "20.0", "49.6",
"530.9", "20.0", "49.6",
"464.6", "20.0", "49.6",
"356.6", "20.0", "49.6",
"200.0", "20.0", "49.6",
"200.0", "20.0", "49.6",
"205.1", "20.0", "49.6",
"429.1", "20.0", "49.6",
"705.7", "20.0", "49.6",
"908.6", "20.0", "49.6",
"917.1", "20.0", "49.6"
)
head(datos)
#> # A tibble: 6 x 3
#> DESPACHO.ECONOMICO.DIARIO.PROGR~ `HORA.PUBLICACION.WE~ `14:25`
#> <chr> <chr> <chr>
#> 1 <NA> <NA> <NA>
#> 2 <NA> <NA> miercoles, 1 de enero ~
#> 3 CCSIH PUCAH NORMH
#> 4 <NA> <NA> <NA>
#> 5 <NA> <NA> <NA>
#> 6 991.9 20.0 49.6
Here is my complete code with the loop.
# Cargar Librerias----
library(fs)
library(tidyverse)
library(readxl)
library(readr)
# OBJETOS----
path <- "Datos"
datos <- list()
fecha.dia <- list()
encabezados <- list()
i <- 1
# FOR LOOP----
for (file in list.files(path, pattern = "*.xlsx")) {
datos[[i]] <- read_excel(paste(path, file, sep = "/"),
sheet = "DESPACHO",
n_max = 29)
## 1. Eliminar columnas vacías----
datos[[i]] <- datos[[i]][1:(length(datos[[i]])-6)]
## 2. Tomar fecha de celda----
fecha.dia[[i]] <- datos[[i]][2,3]
fecha.dia[[i]] <- regmatches(fecha.dia[[i]], regexpr(pattern = "/d.*.*/d$", fecha.dia[[i]]))
fecha.dia[[i]] <- gsub(" de ", " ", fecha.dia[[i]])
fecha.dia[[i]] <- readr::parse_date(fecha.dia[[i]], format = "%d %B %Y", locale = locale("es"))
datos[[i]]$Fecha <- fecha.dia[[i]]
## 3. Manejo de encabezados----
encabezados[[i]] <- datos[[i]][3, ]
encabezados[[i]][1,1] <- "HORA"
encabezados[[i]] <- encabezados[[i]] %>%
pivot_longer(1:length(datos[[i]]), "colum", values_to = "unidad")
encabezados[[i]] <- na.omit(encabezados[[i]]["unidad"])
encabezados[[i]] <- encabezados[[i]][["unidad"]]
colnames(datos[[i]]) <- encabezados[[i]]
## 4. Manipular y preparar tabla----
datos[[i]] <- datos[[i]][-c(1:5), ]
datos[[i]]$DEMANDA <- NULL
datos[[i]]$PRECIO <- NULL
datos[[i]]$RSF <- NULL
datos[[i]]$ECUACOLO230 <- NULL
datos[[i]]$ECUACOLO138 <- NULL
datos[[i]]$ECUAPERU230 <- NULL
datos[[i]]$HORA <- seq(0,23,1)
datos[[i]] <- select(datos[[i]], Fecha, everything()) #ordenar tabla
datos[[i]] <- pivot_longer(datos[[i]], 3:length(datos[[i]]), "unidad", values_to = "potencia")
datos[[i]]$potencia <- round(as.numeric(datos[[i]]$potencia),2)
datos[[i]] <- rbind(datos[[i]])
i <- i + 1
}
When I run it I get this error:
Error: Assigned data
fecha.dia[[i]]
must be compatible with existing data.
x Existing data has 29 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.
Runrlang::last_error()
to see where the error occurred.
I don´t know how to deal with that object. It's my first loop so I hope you can help me.
Please any advice.