I would like some help from you regarding decreasing the processing time for coef
calculation. The idea of this code is to generate a coef
value for all Id
, date
and Category
, but as seen when date2 > date1
(subset_df1
).
As you can see I have a database with 900 obs
, which is not much and besides that I generate coef
value just for date2 > date1
(subset_df1
), leaving only 230 obs. When I generate coef
for all data from subset_df1
it took an average of 73.82 sec elapsed
. In my opinion, this is time consuming, as there is little data. And my idea is to use a much larger database. How can I decrease this processing time?
library(tidyverse)
library(lubridate)
library(data.table)
library(tictoc)
df1 <- data.frame( Id = rep(1:5, length=900),
date1 = as.Date( "2021-12-01"),
date2= rep(seq( as.Date("2021-01-01"), length.out=450, by=1), each = 2),
Category = rep(c("ABC", "EFG"), length.out = 900),
Week = rep(c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
"Saturday", "Sunday"), length.out = 900),
DR1 = sample( 200:250, 900, repl=TRUE),
setNames( replicate(365, { sample(0:900, 900)}, simplify=FALSE),
paste0("DRM0", formatC(1:365, width = 2, format = "d", flag = "0"))))
return_values <- function (df1,idd,dmda, CategoryChosse) {
# First idea: Calculate the median of the values resulting from the subtraction between DR1 and the values of the DRM columns
dt1 <- as.data.table(df1)
cols <- grep("^DRM0", colnames(dt1), value = TRUE)
med <-
dt1[, (paste0(cols, "_PV")) := DR1 - .SD, .SDcols = cols
][, lapply(.SD, median), by = .(Id, Category, Week), .SDcols = paste0(cols, "_PV") ]
# Second idea: After obtaining the median, I add the values found with the values of the DRM columns of my df1 database.
f2 <- function(nm, pat) grep(pat, nm, value = TRUE)
nm1 <- f2(names(df1), "^DRM0\\d+$")
nm2 <- f2(names(med), "_PV")
nm3 <- paste0("i.", nm2)
setDT(df1)[med,(nm2) := Map(`+`, mget(nm1), mget(nm3)), on = .(Id, Category, Week)]
SPV <- df1[, c('Id','date1', 'date2', 'Week','Category', nm2), with = FALSE]#%>%data.frame
# Third idea: Coef values
coef<-SPV %>%
filter(Id==idd, date2 == ymd(dmda), Category == CategoryChosse) %>%
pull(as.numeric(ymd(dmda)-ymd(min(df1$date1)))+6)
return(coef)
}
subset_df1 <- subset(df1, date2 > date1)
tic()
subset_df1 %>%
rowwise %>%
mutate(return_values(df1,Id, date2, Category)) %>%
select(-c(Week,starts_with('DR')))
toc()
# A tibble: 230 x 5
# Rowwise:
Id date1 date2 Category `return_values(df1, Id, date2, Category)`
<int> <date> <date> <chr> <dbl>
1 1 2021-12-01 2021-12-02 ABC 206
2 2 2021-12-01 2021-12-02 EFG 22
3 3 2021-12-01 2021-12-03 ABC -2
4 4 2021-12-01 2021-12-03 EFG 328
5 5 2021-12-01 2021-12-04 ABC 148
6 1 2021-12-01 2021-12-04 EFG 569
7 2 2021-12-01 2021-12-05 ABC -375
8 3 2021-12-01 2021-12-05 EFG 216
9 4 2021-12-01 2021-12-06 ABC 406
10 5 2021-12-01 2021-12-06 EFG 217
# ... with 220 more rows
> toc()
73.82 sec elapsed