Hello, my question is two part:
- Is there a dpylr/tidyverse function that is the equivalent of the opposite of
distinct()
? ie shows are rows that are not unique. - How do you pass many variables(column names) to distinct or add_count without inputting each one? I'm think you probably need some combination of
!!
,enquo()
andvars()
but I could not figure it out.
Example:
df <- dplyr::tribble(~t, ~u, ~v, ~w, ~x, ~y, ~z,
1, "a", 'a', 'b', 4, 'c', NA,
1, "a", 'a', 'b', 4, 'c', 5,
2, "a", 'a', 'b', 4, 'c', 5,
3, "b", 'b', 'b', 8, 'c', 9,
3, "a", 'b', 'b', 10, 'c', 25,
4, "c", 'a', 'b', 4, 'c', 5,
4, "c", 'a', 'b', 4, 'c', NA)
## want a tibble with duplicates removed, but don't use last column in
## identifying duplicates
df %>% distinct(t, u, v, w, x, y, .keep_all = T)
# A tibble: 5 x 7
# t u v w x y z
# <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl>
# 1 1 a a b 4 c NA
# 2 2 a a b 4 c 5
# 3 3 b b b 8 c 9
# 4 3 a b b 10 c 25
# 5 4 c a b 4 c 5
## want to look at all duplicates, again excluding last column from finding
## duplicates
df %>% add_count(t, u, v, w, x, y) %>% filter(n > 1)
# A tibble: 4 x 8
# t u v w x y z n
# <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <int>
# 1 1 a a b 4 c NA 2
# 2 1 a a b 4 c 5 2
# 3 4 c a b 4 c 5 2
# 4 4 c a b 4 c NA 2
This is my first post. I have tried to follow all the guidelines but am happy for feedback/correction.