Post revised to help with clarity. I have a dataset of systolic blood pressure and date/time taken. I have created a fictional example dataset for an individual below. The readings were sometimes taken within minutes of each other, sometimes not.
library(tidyverse)
date_time <- c("Jan 29 2020 13:46:08" ,
"Jan 29 2020 13:42:53" ,
"Jan 29 2020 12:13:27" ,
"Jan 29 2020 12:11:19" ,
"Jan 29 2020 12:09:21" ,
"Jan 28 2020 12:22:26" ,
"Jan 27 2020 8:22:20" ,
"Jan 25 2020 14:34:22" ,
"Jan 25 2020 14:31:13" ,
"Jan 23 2020 12:16:16" ,
"Jan 23 2020 12:13:30" ,
"Jan 20 2020 12:12:59" ,
"Jan 20 2020 12:05:30" ,
"Jan 20 2020 12:01:54")
systol <- c(132 , 132 , 118 , 115 , 110 , 148 , 120 ,
115 , 117 , 134 , 136 , 131 , 132 , 137)
df <- data.frame(date_time , systol) %>%
mutate(dtetime = lubridate::mdy_hms(date_time)) %>%
arrange(dtetime)
df
#> date_time systol dtetime
#> 1 Jan 20 2020 12:01:54 137 2020-01-20 12:01:54
#> 2 Jan 20 2020 12:05:30 132 2020-01-20 12:05:30
#> 3 Jan 20 2020 12:12:59 131 2020-01-20 12:12:59
#> 4 Jan 23 2020 12:13:30 136 2020-01-23 12:13:30
#> 5 Jan 23 2020 12:16:16 134 2020-01-23 12:16:16
#> 6 Jan 25 2020 14:31:13 117 2020-01-25 14:31:13
#> 7 Jan 25 2020 14:34:22 115 2020-01-25 14:34:22
#> 8 Jan 27 2020 8:22:20 120 2020-01-27 08:22:20
#> 9 Jan 28 2020 12:22:26 148 2020-01-28 12:22:26
#> 10 Jan 29 2020 12:09:21 110 2020-01-29 12:09:21
#> 11 Jan 29 2020 12:11:19 115 2020-01-29 12:11:19
#> 12 Jan 29 2020 12:13:27 118 2020-01-29 12:13:27
#> 13 Jan 29 2020 13:42:53 132 2020-01-29 13:42:53
#> 14 Jan 29 2020 13:46:08 132 2020-01-29 13:46:08
I would like to group systolic readings together that took place in a span of 10 minutes. This is the result I would want:
Group 1: rows 1, 2
Group 2: row 3
Group 3: rows 4,5
G4: 6, 7
G4: 8
G5: 9
G6: 10, 11, 12
G7: 13, 14
The ultimate object is to then average readings that were taken close to each other in time. The criterion for "close in time" is readings that are all taken within 10 minutes--i.e,. the last reading in any group has to be within 10 minutes of the first reading in that group. Groups can have any number of readings, as long as the group meets the 10-minutes-from-start-to-finish criterion.
I've tried to solve this with the lag function, but it is awkward and possibly inaccurate. I suspect this might be addressed by detecting time "clusters" but I'm not familiar with those methods. I asked this question on stack overflow, but after a couple of weeks I haven't received any answers that I felt addressed the issue.
I'm a fairly new R user most familiar with the tidyverse approach. I suspect this is not an unusual example since blood pressure readings taken close together in time are often averaged.
Thanks for any help!