Find and remove duplicates across two fields

Hello all,

I'm looking for an easy way to find all those who completed my survey more than once. They weren't supposed to be able to do this, I had ballot-box stuffing disabled on Qualtrics, but somehow I noticed multiple copies of the same name anyway.

In essence, I have two variables in my data I will use to identify duplicate response rows to be deleted: and I can't use just one or the other because obviously, across almost 600 responses, some people are going to have the same first or last names. Therefore, the same first and last name combination needs to appear on two separate rows for it to be a duplicate.

In excel I can see at least 4 people at a very quick glance who completed the survey twice. But automating this in R would be great rather than doing it by hand.

Here's an example of how to find duplicates:


# Fake data
dat = data.frame(first=c("A","A","B","B", "C", "D"), 
                 last=c("x","y","z","z", "w","u"),

# Find duplicates (based on same first and last name)
dat %>% 
  group_by(first, last) %>% 
  first last  value
1 B     z         3
2 B     z         3
# Remove duplicates (keep only first instance of duplicated first and last name combinations)
dat %>% 
  group_by(first, last) %>% 
  first last  value
1 A     x         1
2 A     y         2
3 B     z         3
4 C     w         4
5 D     u         5

Something along these lines

find_dupes <- mtcars %>% group_by(mpg, cyl) %>% count() %>% filter(n > 1)
#> # A tibble: 5 x 3
#> # Groups:   mpg, cyl [5]
#>     mpg   cyl     n
#>   <dbl> <dbl> <int>
#> 1  10.4     8     2
#> 2  15.2     8     2
#> 3  21       6     2
#> 4  22.8     4     2
#> 5  30.4     4     2

Created on 2020-01-08 by the reprex package (v0.3.0)


You guys are true heroes. Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.