How can I add a column to the dataframe that reports if a particular facility has treated less than 5 vs 5 or more patients (I'm basically trying to find if a certain patient was treated at a facility with a lot of experience with that particular treatment or not)
From your question, I'm not sure if year of diagnosis is relevant or not to counting the number of treatments. In case it does not matter, here is an approach that groups by TXT and FacilityID and then counts the number of observations in each group. If that count is greater than or equal to 5, over_5 will be TRUE, otherwise FALSE. Just a note, I created an n_procedures variable for extra clarity, but you could just use row_number() >= 5 in the if_else() if you want to skip that step.