I have a data table which consists of transaction date, time, account number, amount.
All I need is a cumulative count and amount for each account. The catch is it has to reset cumulative value as soon as it reaches 30 minutes, after this it has to restart from 1.
library(data.table)
#---- fake data ----
times <- seq.POSIXt(
from = as.POSIXct("2021-03-12 08:00:00"),
to = as.POSIXct("2021-03-12 12:00:00"),
by = "sec"
)
accounts <- seq(1111, 9999, by = 1000)
amounts <- seq(1, 5000, by = 1)
dt <- data.table(
time = as.ITime(sample(times, 1500, replace = TRUE)),
account = sample(accounts, 1500, replace = TRUE),
amount = sample(amounts, 1500, replace = TRUE)
)
#---- counts and sums ----
# order by account and time
setkeyv(dt, c("account", "time"))
# new time by each half hour
dt[, half_hour := as.ITime(round(as.double(time)/(30*60))*(30*60))]
# count and sum by half hour
dt[,`:=`(n = seq_len(.N), cumulative = cumsum(amount)) , by = .(account, half_hour)]
I will check with my data on your solution and revert..thans again for your time
Hi Jeremy, this seems to be perfect solution to me..
however i notice half_hour variable getting reset after 15 mins instead of 30 minutes. In the above example reset to start at record number 17 instead of 10.
Thanks for catching that. I think this is working now. I added the lubridate package to manage the 30-minute intervals properly with the floor_date() function.
```{r}
library(data.table)
library(lubridate)
#---- fake data ----
times <- seq.POSIXt(
from = as.POSIXct("2021-03-12 08:00:00"),
to = as.POSIXct("2021-03-12 12:00:00"),
by = "sec"
)
accounts <- seq(1111, 9999, by = 1000)
amounts <- seq(1, 5000, by = 1)
dt <- data.table(
datetime = sample(times, 1500, replace = TRUE),
account = sample(accounts, 1500, replace = TRUE),
amount = sample(amounts, 1500, replace = TRUE)
)
# order by account and datetime
setkeyv(dt, c("account", "datetime"))
# separate into date and time columns
dt[, `:=`(date = as.IDate(datetime), time = as.ITime(datetime))]
dt[, datetime := NULL]
#---- counts and sums ----
# new time by each half hour
dt[, half_hour := floor_date(as.POSIXct(date) + time, "30 minutes")]
# count and sum by half hour
dt[,`:=`(n = seq_len(.N), cumulative = cumsum(amount)) , by = .(account, half_hour)]