Hi everyone. I'm cleaning a dataset that looks this:
df <- data.frame(First_Name = c("Adam", "John", "Daniel", "Jack", "David","Emily"),
Second_Name = c("White", NA, "Brown", "White", "Simpsons","Simpson"),
Carer_Number = c("1010101", "9494949", NA, "9494949", "464646", "9494949"),
Company = c(NA, "CompanyB","CompanyC", "CompanyD","CompanyE",NA))
My aim is to remove duplicates in the Carer_Number column. My questions are:
- When I use the code below to filter duplicated values in the column, I only get two duplicates ( expecting 3). I'm doing something wrong here?
repeats_df <- df %>%
filter(duplicated(Carer_Number))
- How do I keep a duplicated number in the column based on other conditions? For example, only keep a duplicated
Carer_Number if the Company and Second Name column is not empty.
Thanks.