I have a very big dataset including data for 130.000 persons (ID). Each person has been using a medicine in some intervals, sometimes with unintended periods of pauses (dose=0). However, when the pause was intended by physician, there is no row of dose=0, but the next row simply starts with a new medication-period.
What I want to do is: when a medication-period was intended, i.e. followed by one or more medication-periods with no rows with pauses (dose=0) inbetween, then these rows should be changed to 1 row starting with "firstday" of the first medication-period and ending with the "lastday" of the last medication-period. Otherwise, if the rows are separated by a row with "dose=0", then this mean the pause was unintended and nothing should be changed in this row.
Here is an example:
test_df <- structure(list(ID = c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c"), dose = c(20, 0, 20, 0, 20, 20, 5, 5, 5, 5, 5, 5, 5, 0, 5, 5), firstday= c("2016-04-28", "2016-08-08","2018-01-20","2018-04-30","2019-12-23","2020-04-02", "2019-01-04", "2019-01-30", "2019-02-26", "2019-03-12", "2019-04-11", "2019-01-31", "2019-03-05", "2019-03-12", "2019-04-23", "2019-05-27"), lastday= c("2016-08-08", "2018-01-20","2018-04-30","2019-12-23","2020-04-01","2020-07-11", "2019-01-11", "2019-02-06", "2019-03-05", "2019-03-19", "2019-04-18", "2019-02-14", "2019-03-12", "2019-04-23", "2019-04-30", "2019-06-03")), row.names = c(NA, -16L), class = "data.frame", .Names = c("ID", "dose", "firstday", "lastday"))
As you can see,"a" had 2 medication-periods after each other with no "dose=0", which mean that row 5 and 6 should be changed to 1 row containing "firstday" from row 5 and "lastday" from row 6. Rows 7 to 11 (fro ID=b) should be converted to only 1 row, containing "firstday" from row 7 and "lastday" from row 11. And rows 15 and 16 (for ID=c) should be converted to 1 row containing "firstday" from row 15 and "lastday" from row 16.
I have tried to do it stepwise by first generating a new column including the dates I want and then deleting excess columns:
tryied < test_df %>% group_by(ID) %>% mutate(new_column = ifelse(dose[row_number() + 1] , lastday, 0), new_column = c(new_column[-n()] , 0))
Howevever, this only works if a medication-period is followed by 1 other medication-period and not if there are several medication-periods following each other, and is not satisfiying at all...