Filter rows before threshold

Doing some Coronavirus work.
I've seen some models that peg the local start date to when there are three days in a row with new cases identified. What I see in many counties is they get the first case and then a period of days, sometimes a month, goes by before they get the next case.
How do I filter out the rows in a county that were before there were three continuous days of cases.
I'm only showing one county for brevity but I would want this to go through the entire dataset and do it for every county. In this case all of the rows with ttl_cases == 1 would be removed.

Thanks for any help.

This is a use case for run length encoding. In outline use mutate(has_case = ifelse(FUNCTION(.) == 0,0,1)) to create a flag column indicating days with or without new cases. (I assume you have that already with lag), then make an rle function to test whether the runs of has_case == 1 satisfies your selection criteria, use that to mutate into a select_this_record's index as the starting date variable and filter away.

(I'd be more succinct with a reprex, but see what you find and come back if you get lost.)

Run-length encoding turned out to be instrumental for removing leading days. I was also able to churn through each county using the purrr package.
Check out my paper on the topic. I gave you a shout out.

1 Like

Sweet! Thanks for the h/t. Glad you were able to see what was going on with the rather tedious example.

Since COVID-19 is no respecter of state boundaries, you might consider the metro

35620 New York-Northern New Jersey-Long Island, NY-NJ-PA (Pennsylvania portion not in sample. White Plains central city recoded to
balance of metropolitan)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.