Finding the time differences across a list of times

TPDeRamus · October 28, 2024, 4:06am

Hi All.

Hoping this is a relatively simple thing to address and I'm just drawing a blank on how to fix it, but I am trying to create an if_else condition for a summarize call that flags a participant whenever 2 dates occur within 30 days of one another.

I figured it would be faster to arrange the dates in ascending order, and group_by the IDs, and use a custom function extract the column of the dates by participant, and use lapply or map_df in combination with difftime to get all the days between dates, and then use the max of these values to flag the participant.

ID	Dates
001	2001-01-01
001	2001-01-20
001	2001-02-03
002	2000-12-20
002	2001-01-15
002	2001-03-20
003	2000-12-20
003	2001-01-20
003	2001-03-22

to

ID	Max_Days	Flag_Min_30
001	19	0
002	30	1
003	61	1

However, I can't seem to find any documentation on how to run difftime or similar quick subtraction across the vector in a simple manner.

Considered seq but think that might be excessive.

Any suggestions would be appreciated!

Here's a quick sample of what the data looks like in it's current state:

df <- structure(list(ID = c("001", "001", "001", "002", "002", "002", 
"003", "003", "003"), Dates = structure(c(11323, 11342, 11356, 
11311, 11337, 11401, 11311, 11341, 11344), class = "Date")), class = "data.frame", row.names = c(NA, 
-9L))

craig.parylo · October 28, 2024, 7:54am

Hi @TPDeRamus,

Here's one approach that starts as you did by ordering by ID and Dates, before working out the number of days between records for each record, then summarising this by ID then flag IDs with differences above 30 days:

library(dplyr)

df <- structure(list(ID = c("001", "001", "001", "002", "002", "002", 
                            "003", "003", "003"), Dates = structure(c(11323, 11342, 11356, 
                                                                      11311, 11337, 11401, 11311, 11341, 11344), class = "Date")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                       -9L))
summary <- df |>
  arrange(ID, Dates) |> 
  mutate(
    # get days since previous date for this ID for each record
    days_diff = as.numeric(Dates - lag(Dates, 1)),
    .by = ID
  ) |> 
  summarise(
    # what is the maximum day difference per ID
    Max_Days = max(days_diff, na.rm = TRUE),
    .by = ID
  ) |> 
  mutate(
    # flag cases where maximum is above threshold
    Flag_Min_30 = as.numeric(Max_Days >= 30)
  )

summary
#>    ID Max_Days Flag_Min_30
#> 1 001       19           0
#> 2 002       64           1
#> 3 003       30           1

^{Created on 2024-10-28 with reprex v2.1.0}

TPDeRamus · October 28, 2024, 12:15pm

This is perfect!

Thanks so much @craig.parylo !

system · November 4, 2024, 12:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.