Merging rows in a contact sheet so there are no duplicates

If there is a list of contacts in an excel spread sheet and some rows have the same information but also new, is there a way to merge these rows? Is there a way to merge the rows in R?

Raw Data Example:
EMAIL---- ------ -FIRST NAME---------LAST NAME----------------PHONE------------------ADDRESS
jsmith@aol.com____John____________Smith________________________________665 W Lane
jsmith@aol.com____John____________Smith______________(222)-455-6678
terry@gmail.com____Terry___________Jones_____________(555)-999-9799 ______ 222 Lane St
karrie@hotmail.com __ Kate_________Arrie______________(324)-456-7899_______1234 75th Ave
__________________ Liam__________George____________(679)-567-4578____3456 Cherry Way
liamg2@aol.com_____Liam__________George ____________________________3456

To ideal data:
EMAIL FIRST NAME LAST NAME PHONE ADDRESS
jsmith@aol.com____John_____________Smith___________(222)-455-6678______665 W Lane
terry@gmail.com___Terry_____________Jones___________(555)-999-9799______222 Lane St
karrie@hotmail.com_Kate_____________Arrie____________(324)-456-7899______1234 75th Ave
liamg2@aol.com____Liam____________George__________(679)-567-4578______3456 Cherry Way

I am looking for a filled out contact sheet with no duplicates.

Hi @feelingjuicy: Have you already imported the excel file into R?

1 Like

Yes, I have imported it

Could you run the command dput(head(your_table)) and post its output here, like this?

<-- paste output here

Yes here it is:

>dput(head(contacts))

structure(list(Email = c("jsmith@aol.com", "jsmith@aol.com", 
"terry@gmail.com", "karrie@hotmail.com ", NA, "liamg2@aol.com"
), `First Name` = c("John", "John", "Terry", "Kate", "Liam", 
"Liam"), `Last Name` = c("Smith", "Smith", "Jones", "Arrie", 
"George", "George"), Phone = c(NA, "(222)-455-6678", "(555)-999-9799", 
"(555)-456-7899", NA, "(555)-567-4578"), Address = c("665 W lane", 
NA, "222 Lane St", "1234 75th Ave", "3456 Cherry Way", "3456"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

Thanks, @feelingjuicy: Are you assuming the first and last name determine an individual? (So there can only be one Liam George.) And if not, what information would be enough to identify an individual?

(I'm also reposting your sample data so it's easy to copy, paste and use.)

sample_data <- structure(list(Email = c("jsmith@aol.com", "jsmith@aol.com", 
"terry@gmail.com", "karrie@hotmail.com ", NA, "liamg2@aol.com"
), `First Name` = c("John", "John", "Terry", "Kate", "Liam", 
"Liam"), `Last Name` = c("Smith", "Smith", "Jones", "Arrie", 
"George", "George"), Phone = c(NA, "(222)-455-6678", "(555)-999-9799", 
"(555)-456-7899", NA, "(555)-567-4578"), Address = c("665 W lane", 
NA, "222 Lane St", "1234 75th Ave", "3456 Cherry Way", "3456"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
sample_data %>% 
group_by(`First Name`,`Last Name`) %>% 
summarise(Email=max(Email,na.rm=TRUE), 
Phone=max(Phone,na.rm=TRUE),
Address=max(Address,na.rm=TRUE))

or more generally

sample_data %>% 
group_by(`First Name`,`Last Name`) %>% 
summarise_all(~max(.,na.rm=TRUE))

I am assuming that if someone has the same first and last name as well as email or a matching phone number then they will be the same. Does that make sense?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.