check domains for two duplicate in mail data

I have a scenario, like i have a million records of data.this is sample data...

DEPT city email two_mail Dept_Disc
Human Resource del 0 0
Infrastructure mum 1 0
Human Resource nav 0 0
Infrastructure pun 0 0
Human Resources bang 0 1
Infrastructure chen 0 0
Human Resource triv 0 0
Infrastructure vish 0 0

i want to check if there is any misspelled in DEPT column for eg(Human Resources)

also if person has given a general mail id in, so for that name i want to check if we have any corporate mail id of that particular person

for example has given personal mail id now i want to check if we have any corporate mail_ID for the same frv.xxt person
frv.xxt = frv is first name and xxt is last name.

the output i am requiring is


Some ideas for you

## setup

samp_df <- tibble::tribble(
              ~DEPT,  ~city,                    ~email, 
   "Human Resource",  "del",     "", 
   "Infrastructure",  "mum",     "", 
   "Human Resource",  "nav",  "", 
   "Infrastructure",  "pun",    "", 
  "Human Resources", "bang",    "", 
   "Infrastructure", "chen",       "", 
   "Human Resource", "triv",    "", 
  "Infrastructure", "vish", "", 
  "Infrastructure", "city", "", 
## Handling department mispellings 

#list of correctly spelled department names
lex <- c("Human Resource", "Infrastructure")

samp_df <- samp_df %>% 
    Dept_Disc = !(DEPT %in% lex)
## Handling dup emails
# str_extract -
# duplicated -

samp_df <- samp_df %>% 
    two_mail = str_extract(email,  "([^@]+)"),
    two_mail = duplicated(two_mail, fromLast = TRUE)

#> # A tibble: 9 x 5
#>   DEPT            city  email                   Dept_Disc two_mail
#>   <chr>           <chr> <chr>                   <lgl>     <lgl>   
#> 1 Human Resource  del     FALSE     FALSE   
#> 2 Infrastructure  mum     FALSE     TRUE    
#> 3 Human Resource  nav  FALSE     FALSE   
#> 4 Infrastructure  pun    FALSE     FALSE   
#> 5 Human Resources bang    TRUE      FALSE   
#> 6 Infrastructure  chen       FALSE     TRUE    
#> 7 Human Resource  triv    FALSE     FALSE   
#> 8 Infrastructure  vish FALSE     FALSE   
#> 9 Infrastructure  city  FALSE     FALSE

Created on 2020-09-08 by the reprex package (v0.3.0)

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.