Filter rows before threshold

Doing some Coronavirus work.
I've seen some models that peg the local start date to when there are three days in a row with new cases identified. What I see in many counties is they get the first case and then a period of days, sometimes a month, goes by before they get the next case.
How do I filter out the rows in a county that were before there were three continuous days of cases.
I'm only showing one county for brevity but I would want this to go through the entire dataset and do it for every county. In this case all of the rows with ttl_cases == 1 would be removed.
"index","id","date","ttl_cases"
"1","Jefferson_Colorado",2020-03-05,1
"2","Jefferson_Colorado",2020-03-06,1
"3","Jefferson_Colorado",2020-03-07,1
"4","Jefferson_Colorado",2020-03-08,1
"5","Jefferson_Colorado",2020-03-09,1
"6","Jefferson_Colorado",2020-03-10,2
"7","Jefferson_Colorado",2020-03-11,4
"8","Jefferson_Colorado",2020-03-12,5
"9","Jefferson_Colorado",2020-03-13,7
"10","Jefferson_Colorado",2020-03-14,9

Thanks for any help.

This is a use case for run length encoding. In outline use mutate(has_case = ifelse(FUNCTION(.) == 0,0,1)) to create a flag column indicating days with or without new cases. (I assume you have that already with lag), then make an rle function to test whether the runs of has_case == 1 satisfies your selection criteria, use that to mutate into a select_this_record's index as the starting date variable and filter away.

(I'd be more succinct with a reprex, but see what you find and come back if you get lost.)

technocrat,
Run-length encoding turned out to be instrumental for removing leading days. I was also able to churn through each county using the purrr package.
Check out my paper on the topic. I gave you a shout out.
https://rpubs.com/kieroneil/597539

1 Like

Sweet! Thanks for the h/t. Glad you were able to see what was going on with the rather tedious example.

Since COVID-19 is no respecter of state boundaries, you might consider the metro

35620 New York-Northern New Jersey-Long Island, NY-NJ-PA (Pennsylvania portion not in sample. White Plains central city recoded to
balance of metropolitan)

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.