dates missing while changing 5-minute data to 4-minute data


I have a dataset (df1) with 5-minute intervals and I want to convert it into 4-minute data.
data looks like

   date                   CO
   <dttm>              <dbl>
 1 2019-05-01 00:00:00   246
 2 2019-05-01 00:05:00   234
 3 2019-05-01 00:10:00   235
 4 2019-05-01 00:15:00   247
 5 2019-05-01 00:20:00   256
 6 2019-05-01 00:25:00   269
 7 2019-05-01 00:30:00   249
 8 2019-05-01 00:35:00   242
 9 2019-05-01 00:40:00   229
10 2019-05-01 00:45:00   234
11 2019-05-01 00:50:00   226
12 2019-05-01 00:55:00   222
13 2019-05-01 01:00:00   238
14 2019-05-01 01:05:00   232
15 2019-05-01 01:10:00   280

For converting/averging into 4-minutes, I am using the code

df1 %>% 
  arrange(date) %>% 
  distinct(date, .keep_all= TRUE) %>%   # remove duplicate rows based on header
  mutate(hour = floor_date(date,'4 minute')) %>%  
  group_by(hour) %>% 
  summarise(across(where(is.numeric), ~ mean(.x, na.rm = TRUE))) 

But, after applying this code, I am missing some dates on e.g. 00:36:00 in the data

# A tibble: 26,496 × 2
   hour                   CO
   <dttm>              <dbl>
 1 2019-05-01 00:00:00   246
 2 2019-05-01 00:04:00   234
 3 2019-05-01 00:08:00   235
 4 2019-05-01 00:12:00   247
 5 2019-05-01 00:20:00   256
 6 2019-05-01 00:24:00   269
 7 2019-05-01 00:28:00   249
 8 2019-05-01 00:32:00   242
 9 2019-05-01 00:40:00   229
10 2019-05-01 00:44:00   234

Please let me know how to merge missing dates

data can be found here Dropbox - NCore_CO_5min.csv - Simplify your life


Hi Kunal,

What are you expecting to see in the 00:36:00 row?
It looks like the 00:35:00 row in your 5-minute-data is correctly rounding down to 00:32:00, and then 00:40:00 stays at 00:40:00 since that is divisible by 4, so it wouldn't "round down" to 00:36:00. At the moment, your code doesn't know that you want a "00:36:00" interval, and it wouldn't know what value to put in the "CO" column if there was one.
Are you wanting to take the average of the 00:35:00 row and the 00:40:00 row and put that in the 00:36:00 row to kind of "smooth out" the interval?
Sorry for answering your question with more questions - this is my first attempt at an answer! But hopefully it gives you something useful to think about until someone more expert comes along.
Cheers and good luck!

Thanks for your response.

In simple words:
I have data in 5 minutes intervals. I want to change it to 4 minutes.
It looks like my code did not do that.

This may or may not be a bad idea to do at all, at your own risk.


(start_df <- structure(list(date = structure(c(
  1556665200, 1556665500, 1556665800,
  1556666100, 1556666400, 1556666700, 1556667000, 1556667300, 1556667600,
  1556667900, 1556668200, 1556668500, 1556668800, 1556669100, 1556669400
), class = c("POSIXct", "POSIXt"), tzone = ""), val = c(
  234, 235, 247, 256, 269, 249, 242, 229, 234, 226, 222, 238, 232,
)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame")))

(expanded_df <- expand(start_df, date = full_seq(date, 60 # 60 seconds, i.e. get every minute
                                                 )) |>
    left_join(start_df) |> 
    fill(val) |> mutate(minutes=minute(date),
                        minute_4 = minutes %% 4 == 0))

(just_4_df <- filter(expanded_df,
                    minute_4) |> select(date,val))

