check for duplicate for unique ID

I have a data frame like below, I want to check if for same id we have duplicate name.
and then mutate a new column.

df10 <- data.frame(id=c(9143,2357,4339,8927,9143,4285,2683,8217,3702,7857,3255,4262,8501,7111,2681,6970),            name=c("xly,mnn","xab,Lan","mhy,mun","vgtu,mmc","ftu,sdh","kull,nnhu","hula,njam","mund,jiha","htfy,ntha","sghu,njui","sgyu,hytb","vdti,kula","mftyu,huta","mhuk,ghul","cday,bhsue","ajtu,nudj"))

the output should be like

id name duplicate_name
9143 xly,mnn 1
2357 xab,Lan 0
4339 mhy,mun 0
8927 vgtu,mmc 0
9143 ftu,sdh 1
4285 kull,nnhu 0
2683 hula,njam 0
8217 mund,jiha 0
3702 htfy,ntha 0
7857 sghu,njui 0
3255 sgyu,hytb 0
4262 vdti,kula 0
8501 mftyu,huta 0
7111 mhuk,ghul 0
2681 cday,bhsue 0
6970 ajtu,nudj 0

Hi,

group_by(id), then apply a counting function, then ungroup.

This code will count how many duplicates (counts > 1) are in the df.

df10 %>% 
  group_by(id) %>% 
  mutate(duplicate_name = n()-1) %>% 
  ungroup()

If you just need a binary yes/no 0/1 variable, then follow the count with an if_else to convert to 0/1

df10 %>% 
  group_by(id) %>% 
  add_count() %>% 
  mutate(duplicate_name = if_else(n > 1, 1, 0)) %>% 
  select(-n) %>% 
  ungroup()

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.