Hi Posit Community
Have a bit of an interesting case on my hand and nothing jumped out at me in the documentation.
I have a set of data not unlike the following example.
ID | date | visit | type |
---|---|---|---|
000000 | 2022-02-21 | 2 | Type_I |
000000 | 2023-02-01 | 1 | Type_I |
000001 | 2023-03-18 | 1 | Type_I |
000001 | 2023-11-03 | 3 | Type_II |
000001 | 2022-01-31 | 2 | Type_II |
000002 | 2022-03-11 | 3 | Type_I |
000002 | 2022-09-04 | 4 | Type_I |
000002 | 2023-01-20 | 2 | Type_I |
000002 | 2022-02-14 | 1 | Type_I |
000003 | 2023-01-08 | 2 | Type_I |
000003 | 2023-10-12 | 3 | Type_I |
000003 | 2023-01-10 | 1 | Type_I |
000004 | 2023-12-21 | 2 | Type_I |
000004 | 2022-09-13 | 3 | Type_I |
000004 | 2023-09-28 | 1 | Type_I |
000004 | 2022-09-08 | 4 | Type_I |
000005 | 2022-05-12 | 3 | Type_I |
000005 | 2022-08-22 | 1 | Type_I |
000005 | 2022-06-20 | 2 | Type_II |
000006 | 2023-08-10 | 1 | Type_I |
df <- structure(list(ID = c("000000", "000000", "000001", "000001",
"000001", "000002", "000002", "000002", "000002", "000003", "000003",
"000003", "000004", "000004", "000004", "000004", "000005", "000005",
"000005", "000006"), date = structure(c(19044, 19389, 19434,
19664, 19023, 19062, 19239, 19377, 19037, 19365, 19642, 19367,
19712, 19248, 19628, 19243, 19124, 19226, 19163, 19579), class = "Date"),
visit = c(2L, 1L, 1L, 3L, 2L, 3L, 4L, 2L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 4L, 3L, 1L, 2L, 1L), type = c("Type_I", "Type_I",
"Type_I", "Type_II", "Type_II", "Type_I", "Type_I", "Type_I",
"Type_I", "Type_I", "Type_I", "Type_I", "Type_I", "Type_I",
"Type_I", "Type_I", "Type_I", "Type_I", "Type_II", "Type_I"
)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
))
Where I need to set a conditional boolean flag based on the criteria:
- At least one instance of "Type_II"
- Two or more cases of "Type_I" that occur within at least 90 days of one another
So that the output would look something like this:
ID | Flag |
---|---|
000000 | 0 |
000001 | 1 |
000002 | 1 |
000003 | 1 |
000004 | 1 |
000005 | 1 |
000006 | 0 |
My first instinct here was to use pivot_wider
and add_count
by ID to check for entries by participant and set up conditional criteria for subsequent summaries:
df2 <- df |> pivot_wider(names_from = type, values_from = date) |> add_count(ID)
And flag !is.na(Type_II)
and between(Type_I<date_y>,Type_I<date_x>,(Type_I<date_x>+days(90))
across any ascending order of dates to conditionally set the flags, but I'm having difficulty understanding how to succinctly combine summarize
and case_when
to do this.
I would appreciate any suggestions on how this might be accomplished within the tidyverse
framework or otherwise.
Thank you in advance!