validating records in data

I am validating duplicates in data frame but how i can ignore NA's and blank cells any solution....???

df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","KTN_2252","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
                  email = c("akash.dev@abcd.com","rahul.singh@abcd.com","salman.abbas@abcd.com","ram.lal@abcd.com","rahul.singh@abcd.com","prabal.garg@xyz.com","sanu.ali@abcd.com","salman.abbas@abcd.com","","",NA,NA,"giriraj.singh@dkl.com","lokesh.sharma@abcd.com","pooja.pawar@abcd.com","nikita.sharma@abcd.com"))


ID = "emp_id"
Email = "email"

df4 <- df4 %>% 
  mutate(across(c(ID, Email), ~as.integer(duplicated(.)), .names = 'flag_{col}'))

duplicated() has the argument incomparables that should do what you want:

df4 %>% 
  mutate(across(everything(),
                ~as.integer(duplicated(., incomparables = NA_character_)), .names = 'flag_{col}'))

duplicated takes the argument incomparables. You can provide a vector of values that should not be compared. Modifying your example above gives what you're looking for, I think.

library(tidyverse)

df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","KTN_2252","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
                  email = c("akash.dev@abcd.com","rahul.singh@abcd.com","salman.abbas@abcd.com","ram.lal@abcd.com","rahul.singh@abcd.com","prabal.garg@xyz.com","sanu.ali@abcd.com","salman.abbas@abcd.com","","",NA,NA,"giriraj.singh@dkl.com","lokesh.sharma@abcd.com","pooja.pawar@abcd.com","nikita.sharma@abcd.com"))


ID = "emp_id"
Email = "email"

df4 <- df4 %>% 
  mutate(across(c(ID, Email), ~as.integer(duplicated(., incomparables = c(NA, ""))), .names = 'flag_{col}'))

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.