Find and remove duplicates across two fields

Hello all,

I'm looking for an easy way to find all those who completed my survey more than once. They weren't supposed to be able to do this, I had ballot-box stuffing disabled on Qualtrics, but somehow I noticed multiple copies of the same name anyway.

In essence, I have two variables in my data I will use to identify duplicate response rows to be deleted: First.name and Last.name. I can't use just one or the other because obviously, across almost 600 responses, some people are going to have the same first or last names. Therefore, the same first and last name combination needs to appear on two separate rows for it to be a duplicate.

In excel I can see at least 4 people at a very quick glance who completed the survey twice. But automating this in R would be great rather than doing it by hand.

Here's an example of how to find duplicates:

library(tidyverse)

# Fake data
dat = data.frame(first=c("A","A","B","B", "C", "D"), 
                 last=c("x","y","z","z", "w","u"),
                 value=c(1,2,3,3,4,5)) 

# Find duplicates (based on same first and last name)
dat %>% 
  group_by(first, last) %>% 
  filter(n()>1)
  first last  value
1 B     z         3
2 B     z         3
# Remove duplicates (keep only first instance of duplicated first and last name combinations)
dat %>% 
  group_by(first, last) %>% 
  slice(1)
  first last  value
1 A     x         1
2 A     y         2
3 B     z         3
4 C     w         4
5 D     u         5
3 Likes

Something along these lines

suppressPackageStartupMessages(library(dplyr)) 
find_dupes <- mtcars %>% group_by(mpg, cyl) %>% count() %>% filter(n > 1)
find_dupes
#> # A tibble: 5 x 3
#> # Groups:   mpg, cyl [5]
#>     mpg   cyl     n
#>   <dbl> <dbl> <int>
#> 1  10.4     8     2
#> 2  15.2     8     2
#> 3  21       6     2
#> 4  22.8     4     2
#> 5  30.4     4     2

Created on 2020-01-08 by the reprex package (v0.3.0)

2 Likes

You guys are true heroes. Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.