How to aggregate 15-min data to 30, 45 and so on ?

meitei · May 8, 2022, 10:25pm

I have a dataset that recorded at 15-min interval (non-continuous) .
How can I aggregate this dataset to 30-min, 45-min, 1-hr and 2-hr?.

value <- c(1.869, 1.855, 1.855, 1.855, 1.855, 1.855, 1.855, 1.848, 1.848, 1.848, 1.848, 1.848, 1.848, 1.849)
date <- c("2013-08-28 08:00:00 UTC", "2013-08-28 08:15:00 UTC", 
          "2013-08-28 08:45:00 UTC", "2013-08-28 09:15:00 UTC", 
          "2013-09-28 09:45:00 UTC", "2013-08-28 10:15:00 UTC", 
          "2013-08-28 10:30:00 UTC", "2013-08-28 10:45:00 UTC",
          "2013-08-28 11:00:00 UTC", "2013-08-28 11:45:00 UTC", 
          "2013-08-28 12:00:00 UTC", "2013-08-29 00:15:00 UTC", 
          "2013-08-29 01:00:00 UTC", "2013-08-29 01:15:00 UTC")
data <- data.frame(date=date,value=value)

Thank you..!!!

williaml · May 8, 2022, 11:21pm

Hi, something like this? This is for 30 mins, but it is the same for the other times.

library(tidyverse)
library(lubridate)
data %>% 
  mutate(date = ymd_hms(date),
         time_30min = ceiling_date(date, "30 minutes")) %>% 
  group_by(time_30min) %>% 
  summarise(value = sum(value))

# A tibble: 11 × 2
   time_30min          value
   <dttm>              <dbl>
 1 2013-08-28 08:00:00  1.87
 2 2013-08-28 08:30:00  1.86
 3 2013-08-28 09:00:00  1.86
 4 2013-08-28 09:30:00  1.86
 5 2013-08-28 10:30:00  3.71
 6 2013-08-28 11:00:00  3.70
 7 2013-08-28 12:00:00  3.70
 8 2013-08-29 00:30:00  1.85
 9 2013-08-29 01:00:00  1.85
10 2013-08-29 01:30:00  1.85
11 2013-09-28 10:00:00  1.86

meitei · May 9, 2022, 3:31am

Thank you so much @williaml .
The code worked for other time scales too except for 45-min.

I got it for 45-mins like this

> data %>% 
+   mutate(date = ymd_hms(date),
+          time_45min = ceiling_date(date, "45 minutes")) %>% 
+   group_by(time_45min) %>% 
+   summarise(value = sum(value))
# A tibble: 11 × 2
   time_45min          value
   <dttm>              <dbl>
 1 2013-08-28 08:00:00  1.87
 2 2013-08-28 08:45:00  3.71
 3 2013-08-28 09:45:00  1.86
 4 2013-08-28 10:45:00  5.56
 5 2013-08-28 11:00:00  1.85
 6 2013-08-28 11:45:00  1.85
 7 2013-08-28 12:00:00  1.85
 8 2013-08-29 00:45:00  1.85
 9 2013-08-29 01:00:00  1.85
10 2013-08-29 01:45:00  1.85
11 2013-09-28 09:45:00  1.86

I was assuming to get something like this

   time_30min          value
   <dttm>              <dbl>
 1 2013-08-28 08:00:00  xx
 2 2013-08-28 08:45:00  xx
 3 2013-08-28 09:30:00  xxx
 4 2013-08-28 10:15:00  xxx
............
........
........

williaml · May 9, 2022, 3:51am

Sorry, what were you expecting the output to be? It is unclear from your post.

meitei · May 9, 2022, 4:00am

I was expecting the outputs to be in intervals of 45-min.

   time_30min          value
   <dttm>              <dbl>
 1 2013-08-28 08:00:00  xx
 2 2013-08-28 08:45:00  xx
 3 2013-08-28 09:30:00  xxx
 4 2013-08-28 10:15:00  xxx

Here, the time difference between the rows is 45-mins.
Even though your script's result is at every 45-min (interval is 1-hour)
such as

   time_45min          value
   <dttm>              <dbl>
 1 2013-08-28 08:00:00  1.87
 2 2013-08-28 08:45:00  3.71
 3 2013-08-28 09:45:00  1.86
 4 2013-08-28 10:45:00  5.56

Hope this clears.
Thank you

williaml · May 9, 2022, 4:49am

I see. Not sure about that one.

Galangjs · May 13, 2022, 2:40am

I think this might be what you're looking for.

library(tidyverse)
library(lubridate)
data %>%
  mutate(date = ymd_hms(date),
         min_since_start = interval(min(date), date) %>% time_length('minute'),
         period = floor(min_since_start / 45),
         period_start = min(date) + minutes(period * 45)) %>%
  group_by(period_start) %>%
  summarise(value = sum(value))