View and then remove one of two duplicate values

Hello-

I have two duplicates of case_id and want to retain one of them. How do I filter & browse the duplicates? For example, I have two values of df_linelist$case_id SSudan_Juba_QTF5V and want to eventually remove one of them, but not both- trying to filter first to visualize.

#SSudan_Juba_QTF5V
filter(df_linelist$case_id = "SSudan_Juba_QTF5V")
df_linelist %>% filter(df_linelist$case_id=="SSudan_Juba_QTF5V")

Grace

Try df_linelist %>% group_by(case_id) %>% filter(n() > 1).

Building on @siddharthprabhu 's just a smidge, if you have cases where you may have even more than 2 and that is useful to know, you can get to that as well by having a result that shows the number of duplicates when there is at least one duplicate:

df <- df_linelist %>% group_by(case_id) %>% summarise(occurrences = n()) %>%
      filter(occurrences > 1) %>% arrange(-occurrences) 

If you actually then want to compare the duplicates and the values in other columns in the original data frame (to see if they truly are duplicates for the entire record or, rather, if there are actually values elsewhere in the data frame that differ between the duplicates), you can join the filtered list back onto itself to get the complete data frame (all columns) but just with duplicate rows:

df <- df_linelist %>% group_by(case_id) %>% summarise(occurrences = n()) %>%
      filter(occurrences > 1) %>% arrange(-occurrences) %>%
      left_join(df_linelist)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.