validate data with the help of email_id

i have a data frame like below, now i want to check if name before @ are duplicate, if duplicate then mutate new column to(1,0) for TRUE and FALSE so i am able do it.

df <- data.frame(ID =c("DEV2962","KTN2252","ANA2719","ITI2624","DEV2698","HRT2921","","KTN2624","ANA2548","ITI2535","DEV2732","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
                 city=c("del","mum","nav","pun","bang","chen","triv","vish","del","mum","bang","vish","bhop","kol","noi","gurg"),
                 email = c("akash.dev@xyzpart.com","rahul.singh@xyzpart.com","salman.abbas@xyzpart.com","ram.lal@xyzpart.com","ram.lal@xyzpart.com","prabal.garg@xyzpart.com","sanu.ali@xyzpart.com","kunal.singh@xyzpart.com","lakhan.tomar@xyzpart.com","praveen.thakur@xyzpart.com","sarman.ali@xyzpart.com","zuber.khan@xyzpart.com","giriraj.singh@xyzpart.com","lokesh.sharma@xyzpart.com","pooja.pawar@xyzpart.com","nikita.sharma@xyzpart.com"),
                 name= c("dev,akash","singh,rahul","abbas,salman","lal,ram","singh,nkunj","garg,prabal","ali,sanu","singh,kunal","tomar,lakhan","thakur,praveen","ali,sarman","khan,zuber","singh,giriraj","sharma,lokesh","pawar,pooja","sharma,nikita"))


library(stringr)
df <- df %>% 
  mutate(first =str_extract(email, "[^\\@]+"),
         duplicate = as.numeric(duplicated(first))) %>% select(-first)

i also have a same old data frame, to check if mail ID is present in old data frame if present the check all records are same like (name,city,ID)

so i will check my mail id's present in the same old data frame , so if present the i need to validate if all variable like (city,name,ID) is exact match, if not the mutate a new column to true and false.

ID city email name duplicate_name discrepancy
DEV2962 del akash.dev@xyzcomp.com dev,akash 0 0
KTN2252 mum rahul.singh@xyzcomp.com singh,rahul 0 0
ANA2719 nav salman.abbas@xyzcomp.com abbas,salman 0 0
ITI2624 pun ram.lal@xyzcomp.com lal,ram 0 0
DEV2698 bang ram.lal@xyzcomp.com singh,nkunj 1 0
HRT2921 chen prabal.garg@xyzcomp.com garg,prabal 0 0
triv sanu.ali@xyzcomp.com ali,sanu 0 0
KTN2624 vish kunal.singh@xyzcomp.com singh,kunal 0 0
ANA2548 del lakhan.tomar@xyzcomp.com tomar,lakhan 0 0
ITI2535 mum praveen.thakur@xyzcomp.com thakur,praveen 0 1
DEV2732 bang sarman.ali@xyzcomp.com ali,sarman 0 0
HRT2837 vish zuber.khan@xyzcomp.com khan,zuber 0 0
ERV2951 bhop giriraj.singh@xyzcomp.com singh,giriraj 0 0
KTN2542 kol lokesh.sharma@xyzcomp.com sharma,lokesh 0 0
ANA2813 noi pooja.pawar@xyzcomp.com pawar,pooja 0 0
ITI2210 gurg nikita.sharma@xyzcomp.com sharma,nikita 0 0

Here is an example of how I can find differences between common rows (based on id) of two tables

library(tidyverse)
(data1 <-
  tibble(id=letters[1:3],
         a =LETTERS[23:25],
         b = c(4L,1L,2L)))

(data2 <-
    tibble(id=letters[1:4],
           a =LETTERS[23:26],
           b = 4:1))

#what id's are common between data1 and data2
(commonids<-inner_join(data1,
           data2,
           by="id")  %>% pull(id))

#find discrepencies
discrep <- FALSE
discrep<-map_lgl(commonids,
     ~!isTRUE(all_equal(filter(data1,id %in% .),
                filter(data2,id %in% .))))

# attach them to data1
data1$discrep <- FALSE
data1[which(data1$id %in% commonids),"discrep"] <- discrep
data1

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.