How do you remove duplicate rows while conditionally keeping the observation with a larger value?

When there are duplicate rows with one column being different (Rate), how do you remove the duplicates that are smaller - keeping the higher value Rate and not altering the other data. I want to conditionally select, when there are duplicates, the higher value in the Rate column. Any tips? I did a group_by and summarised a count to find duplicates. I filtered down the df to the duplicates and removed the smaller values with a filter using & and was going to rbind it back but there has to be a quicker way?

Town Occupation Rate
Chicago Teacher 20
Chicago Teacher 25
Madison Nurse 0
Madison Nurse 35
Madison Doctor 45
Madison Custodian 33
Chicago Dental Hygienist 22
Town Occupation Rate
Chicago Teacher 25
Madison Nurse 35
Madison Doctor 45
Madison Custodian 33
Chicago Dental Hygienist 22

I think you want to group by Town and Occupation and summarize to find the maximum Rate.

1 Like

This should work:

df %>% 
  group_by(Town, Occupation) %>%
  slice_max(Rate) %>%
  ungroup()

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.