dplyr::case_when() + lag to iterate through a column

Ideally, I would like to fill a variable, period, based on the difference of two, changing, dates.

Say my df looks like this:

id index_date test_date period
1 1/1/2006 1/1/2006
1 1/1/2006 3/4/2007
1 1/1/2006 3/5/2007
1 1/1/2006 6/18/2012
2 2/16/2016 2/16/2016
2 2/16/2016 5/1/2018
2 2/16/2016 9/6/2018
3 9/10/2008 9/10/2008
3 9/10/2008 7/5/2013
4 6/8/2014 6/8/2014
4 6/8/2014 6/28/2014
4 6/8/2014 7/13/2014
4 6/8/2014 5/24/2017
4 6/8/2014 12/4/2020
4 6/8/2014 12/10/2020

period is a factor that corresponds to a calculation based on previous test dates. Here is what I would like to do in

df %>%
    group_by(id) %>%
    arrange(test_date, .by_group = T) %>%
    mutate(
        period =
            case_when(
                test_date[2] - test_date[1] >= 45 ~ 1,
                test_date[3] - test_date[2] >= 45 ~ 2
...
            )
    )

While this would work if there were few test_date/participant, some ids have over 300 test_dates. I'm thinking I could use lag here but not sure how to implement. The final df should look something like this:

id index_date test_date period
1 1/1/2006 1/1/2006 NA
1 1/1/2006 3/4/2007 1
1 1/1/2006 3/5/2007 NA
1 1/1/2006 6/18/2012 2
2 2/16/2016 2/16/2016 NA
2 2/16/2016 5/1/2018 1
2 2/16/2016 9/6/2018 2
3 9/10/2008 9/10/2008 NA
3 9/10/2008 7/5/2013 1
4 6/8/2014 6/8/2014 NA
4 6/8/2014 6/28/2014 1
4 6/8/2014 7/13/2014 1
4 6/8/2014 5/24/2017 2
4 6/8/2014 12/4/2020 3
4 6/8/2014 12/10/2020 3

What do you mean by test_date[2] ?
As I understand it, this would be the value of the column test_date for the second observation.
This would imply that whatever the calculation in the case_when clause the result period for all observations would be the same.
How could then the value for the first observation be NA ?

Please indicate exactly in words how you would like to calculate period.
And please provide you input as described e.g. in
https://forum.posit.co/t/faq-whats-a-reproducible-example-reprex-and-how-do-i-do-one/5219

1 Like

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.