Hi All.
Hoping this is a relatively simple thing to address and I'm just drawing a blank on how to fix it, but I am trying to create an if_else condition for a summarize call that flags a participant whenever 2 dates occur within 30 days of one another.
I figured it would be faster to arrange the dates in ascending order, and group_by the IDs, and use a custom function extract the column of the dates by participant, and use lapply or map_df in combination with difftime to get all the days between dates, and then use the max of these values to flag the participant.
| ID | Dates |
|---|---|
| 001 | 2001-01-01 |
| 001 | 2001-01-20 |
| 001 | 2001-02-03 |
| 002 | 2000-12-20 |
| 002 | 2001-01-15 |
| 002 | 2001-03-20 |
| 003 | 2000-12-20 |
| 003 | 2001-01-20 |
| 003 | 2001-03-22 |
to
| ID | Max_Days | Flag_Min_30 |
|---|---|---|
| 001 | 19 | 0 |
| 002 | 30 | 1 |
| 003 | 61 | 1 |
However, I can't seem to find any documentation on how to run difftime or similar quick subtraction across the vector in a simple manner.
Considered seq but think that might be excessive.
Any suggestions would be appreciated!
Here's a quick sample of what the data looks like in it's current state:
df <- structure(list(ID = c("001", "001", "001", "002", "002", "002",
"003", "003", "003"), Dates = structure(c(11323, 11342, 11356,
11311, 11337, 11401, 11311, 11341, 11344), class = "Date")), class = "data.frame", row.names = c(NA,
-9L))