Finding the time differences across a list of times

Hi All.

Hoping this is a relatively simple thing to address and I'm just drawing a blank on how to fix it, but I am trying to create an if_else condition for a summarize call that flags a participant whenever 2 dates occur within 30 days of one another.

I figured it would be faster to arrange the dates in ascending order, and group_by the IDs, and use a custom function extract the column of the dates by participant, and use lapply or map_df in combination with difftime to get all the days between dates, and then use the max of these values to flag the participant.

ID Dates
001 2001-01-01
001 2001-01-20
001 2001-02-03
002 2000-12-20
002 2001-01-15
002 2001-03-20
003 2000-12-20
003 2001-01-20
003 2001-03-22

to

ID Max_Days Flag_Min_30
001 19 0
002 30 1
003 61 1

However, I can't seem to find any documentation on how to run difftime or similar quick subtraction across the vector in a simple manner.

Considered seq but think that might be excessive.

Any suggestions would be appreciated!

Here's a quick sample of what the data looks like in it's current state:

df <- structure(list(ID = c("001", "001", "001", "002", "002", "002", 
"003", "003", "003"), Dates = structure(c(11323, 11342, 11356, 
11311, 11337, 11401, 11311, 11341, 11344), class = "Date")), class = "data.frame", row.names = c(NA, 
-9L))

Hi @TPDeRamus,

Here's one approach that starts as you did by ordering by ID and Dates, before working out the number of days between records for each record, then summarising this by ID then flag IDs with differences above 30 days:

library(dplyr)

df <- structure(list(ID = c("001", "001", "001", "002", "002", "002", 
                            "003", "003", "003"), Dates = structure(c(11323, 11342, 11356, 
                                                                      11311, 11337, 11401, 11311, 11341, 11344), class = "Date")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                       -9L))
summary <- df |>
  arrange(ID, Dates) |> 
  mutate(
    # get days since previous date for this ID for each record
    days_diff = as.numeric(Dates - lag(Dates, 1)),
    .by = ID
  ) |> 
  summarise(
    # what is the maximum day difference per ID
    Max_Days = max(days_diff, na.rm = TRUE),
    .by = ID
  ) |> 
  mutate(
    # flag cases where maximum is above threshold
    Flag_Min_30 = as.numeric(Max_Days >= 30)
  )

summary
#>    ID Max_Days Flag_Min_30
#> 1 001       19           0
#> 2 002       64           1
#> 3 003       30           1

Created on 2024-10-28 with reprex v2.1.0

1 Like

This is perfect!

Thanks so much @craig.parylo !

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.