Hi All.
Hoping this is a relatively simple thing to address and I'm just drawing a blank on how to fix it, but I am trying to create an if_else
condition for a summarize
call that flags a participant whenever 2 dates occur within 30 days of one another.
I figured it would be faster to arrange
the dates in ascending order, and group_by
the IDs, and use a custom function extract the column of the dates by participant, and use lapply
or map_df
in combination with difftime
to get all the days between dates, and then use the max
of these values to flag the participant.
ID | Dates |
---|---|
001 | 2001-01-01 |
001 | 2001-01-20 |
001 | 2001-02-03 |
002 | 2000-12-20 |
002 | 2001-01-15 |
002 | 2001-03-20 |
003 | 2000-12-20 |
003 | 2001-01-20 |
003 | 2001-03-22 |
to
ID | Max_Days | Flag_Min_30 |
---|---|---|
001 | 19 | 0 |
002 | 30 | 1 |
003 | 61 | 1 |
However, I can't seem to find any documentation on how to run difftime
or similar quick subtraction across the vector in a simple manner.
Considered seq
but think that might be excessive.
Any suggestions would be appreciated!
Here's a quick sample of what the data looks like in it's current state:
df <- structure(list(ID = c("001", "001", "001", "002", "002", "002",
"003", "003", "003"), Dates = structure(c(11323, 11342, 11356,
11311, 11337, 11401, 11311, 11341, 11344), class = "Date")), class = "data.frame", row.names = c(NA,
-9L))