finding duplicate values is not working properly

I have a table like below and I want to find duplicates values in few columns . so finding duplicate is working but if any is coming thrice or four time then every time it should show duplicate.

df <- data.frame(ID =c("DEV2962","KTN2252","KTN2252","ANA2548","DEV2698","HRT2921",NA,"KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ANA2548","KTN2542","ANA2813","ITI2210"),
                 city=c("del","mum","nav","pun","bang","chen","triv","vish","del","mum","bang","vish","bhop","kol","noi","gurg"),
                 Name= c("dev,akash","singh,rahul","abbas,salman","lal,ram","singh,nkunj","sharma,nikita","ali,sarman","singh,kunal","tomar,lakhan","thakur,praveen","ali,sarman","khan,zuber","singh,giriraj","sharma,lokesh","sharma,nikita","sharma,nikita"))


colss <- c("ID","Name")

df <- df %>% mutate(
  across(.cols= colss,
         .fns = duplicated,
         .names = "{c(1,9)}. unique {col}"))

The output should show all values as if it comes more than once.

ID city Name 1. duplicate_id 9. duplicate_Name
DEV2962 del dev,akash
KTN2252 mum singh,rahul duplicate_id
KTN2252 nav abbas,salman duplicate_id
ANA2548 pun lal,ram duplicate_id
DEV2698 bang singh,nkunj
HRT2921 chen sharma,nikita duplicate_name
triv ali,sarman duplicate_name
KTN2624 vish singh,kunal
ANA2548 del tomar,lakhan duplicate_id
ITI2535 mum thakur,praveen
DEV2732 bang ali,sarman duplicate_name
HRT2837 vish khan,zuber
ANA2548 bhop singh,giriraj duplicate_id
KTN2542 kol sharma,lokesh
ANA2813 noi sharma,nikita duplicate_name
ITI2210 gurg sharma,nikita duplicate_name

The function duplicated is a little unintuitive in that it returns TRUE for only one instance of the duplicated set. Take a close look at this.

library(tidyverse)

# this is how duplicated works  
duplicated(c(1, 1, 2, 3, 4))  
#> [1] FALSE  TRUE FALSE FALSE FALSE
  
df <- data.frame(ID =c("DEV2962","KTN2252","KTN2252","ANA2548","DEV2698","HRT2921",NA,"KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ANA2548","KTN2542","ANA2813","ITI2210"),
                 city=c("del","mum","nav","pun","bang","chen","triv","vish","del","mum","bang","vish","bhop","kol","noi","gurg"),
                 Name= c("dev,akash","singh,rahul","abbas,salman","lal,ram","singh,nkunj","sharma,nikita","ali,sarman","singh,kunal","tomar,lakhan","thakur,praveen","ali,sarman","khan,zuber","singh,giriraj","sharma,lokesh","sharma,nikita","sharma,nikita"))

# you probably want this instead
df <- mutate(df, dup = Name %in% Name[duplicated(Name)])
df
#>         ID city           Name   dup
#> 1  DEV2962  del      dev,akash FALSE
#> 2  KTN2252  mum    singh,rahul FALSE
#> 3  KTN2252  nav   abbas,salman FALSE
#> 4  ANA2548  pun        lal,ram FALSE
#> 5  DEV2698 bang    singh,nkunj FALSE
#> 6  HRT2921 chen  sharma,nikita  TRUE
#> 7     <NA> triv     ali,sarman  TRUE
#> 8  KTN2624 vish    singh,kunal FALSE
#> 9  ANA2548  del   tomar,lakhan FALSE
#> 10 ITI2535  mum thakur,praveen FALSE
#> 11 DEV2732 bang     ali,sarman  TRUE
#> 12 HRT2837 vish     khan,zuber FALSE
#> 13 ANA2548 bhop  singh,giriraj FALSE
#> 14 KTN2542  kol  sharma,lokesh FALSE
#> 15 ANA2813  noi  sharma,nikita  TRUE
#> 16 ITI2210 gurg  sharma,nikita  TRUE

Created on 2021-06-23 by the reprex package (v1.0.0)

I want to do this for list of columns , and output should be mutated columns . is there anything i can modify in my code ....??

library(tidyverse)

df <- data.frame(ID =c("DEV2962","KTN2252","KTN2252","ANA2548","DEV2698","HRT2921",NA,"KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ANA2548","KTN2542","ANA2813","ITI2210"),
                 city=c("del","mum","nav","pun","bang","chen","triv","vish","del","mum","bang","vish","bhop","kol","noi","gurg"),
                 Name= c("dev,akash","singh,rahul","abbas,salman","lal,ram","singh,nkunj","sharma,nikita","ali,sarman","singh,kunal","tomar,lakhan","thakur,praveen","ali,sarman","khan,zuber","singh,giriraj","sharma,lokesh","sharma,nikita","sharma,nikita"))

df2 <- bind_cols(
  df,
  df %>% 
    select(ID, city, Name) %>%   # which columns to find duplicates
    mutate_all(function(x) x %in% x[duplicated(x)]) %>% 
    rename_all(paste0, "_dup")
)
df2
#>         ID city           Name ID_dup city_dup Name_dup
#> 1  DEV2962  del      dev,akash  FALSE     TRUE    FALSE
#> 2  KTN2252  mum    singh,rahul   TRUE     TRUE    FALSE
#> 3  KTN2252  nav   abbas,salman   TRUE    FALSE    FALSE
#> 4  ANA2548  pun        lal,ram   TRUE    FALSE    FALSE
#> 5  DEV2698 bang    singh,nkunj  FALSE     TRUE    FALSE
#> 6  HRT2921 chen  sharma,nikita  FALSE    FALSE     TRUE
#> 7     <NA> triv     ali,sarman  FALSE    FALSE     TRUE
#> 8  KTN2624 vish    singh,kunal  FALSE     TRUE    FALSE
#> 9  ANA2548  del   tomar,lakhan   TRUE     TRUE    FALSE
#> 10 ITI2535  mum thakur,praveen  FALSE     TRUE    FALSE
#> 11 DEV2732 bang     ali,sarman  FALSE     TRUE     TRUE
#> 12 HRT2837 vish     khan,zuber  FALSE     TRUE    FALSE
#> 13 ANA2548 bhop  singh,giriraj   TRUE    FALSE    FALSE
#> 14 KTN2542  kol  sharma,lokesh  FALSE    FALSE    FALSE
#> 15 ANA2813  noi  sharma,nikita  FALSE    FALSE     TRUE
#> 16 ITI2210 gurg  sharma,nikita  FALSE    FALSE     TRUE

Created on 2021-06-23 by the reprex package (v1.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.