Ideally, I would like to fill a variable, period, based on the difference of two, changing, dates.
Say my df looks like this:
id | index_date | test_date | period |
---|---|---|---|
1 | 1/1/2006 | 1/1/2006 | |
1 | 1/1/2006 | 3/4/2007 | |
1 | 1/1/2006 | 3/5/2007 | |
1 | 1/1/2006 | 6/18/2012 | |
2 | 2/16/2016 | 2/16/2016 | |
2 | 2/16/2016 | 5/1/2018 | |
2 | 2/16/2016 | 9/6/2018 | |
3 | 9/10/2008 | 9/10/2008 | |
3 | 9/10/2008 | 7/5/2013 | |
4 | 6/8/2014 | 6/8/2014 | |
4 | 6/8/2014 | 6/28/2014 | |
4 | 6/8/2014 | 7/13/2014 | |
4 | 6/8/2014 | 5/24/2017 | |
4 | 6/8/2014 | 12/4/2020 | |
4 | 6/8/2014 | 12/10/2020 |
period is a factor that corresponds to a calculation based on previous test dates. Here is what I would like to do in
df %>%
group_by(id) %>%
arrange(test_date, .by_group = T) %>%
mutate(
period =
case_when(
test_date[2] - test_date[1] >= 45 ~ 1,
test_date[3] - test_date[2] >= 45 ~ 2
...
)
)
While this would work if there were few test_date/participant, some ids have over 300 test_dates. I'm thinking I could use lag here but not sure how to implement. The final df should look something like this:
id | index_date | test_date | period |
---|---|---|---|
1 | 1/1/2006 | 1/1/2006 | NA |
1 | 1/1/2006 | 3/4/2007 | 1 |
1 | 1/1/2006 | 3/5/2007 | NA |
1 | 1/1/2006 | 6/18/2012 | 2 |
2 | 2/16/2016 | 2/16/2016 | NA |
2 | 2/16/2016 | 5/1/2018 | 1 |
2 | 2/16/2016 | 9/6/2018 | 2 |
3 | 9/10/2008 | 9/10/2008 | NA |
3 | 9/10/2008 | 7/5/2013 | 1 |
4 | 6/8/2014 | 6/8/2014 | NA |
4 | 6/8/2014 | 6/28/2014 | 1 |
4 | 6/8/2014 | 7/13/2014 | 1 |
4 | 6/8/2014 | 5/24/2017 | 2 |
4 | 6/8/2014 | 12/4/2020 | 3 |
4 | 6/8/2014 | 12/10/2020 | 3 |