How to filter out groups consisting of only one entry

sdeepj · June 3, 2020, 5:48pm

In my dataset, I'm looking for multiple entries of values in a column. I first grouped my dataset by the column, let's call it column X. I now want to get rid of all the groups in column X that consist of only 1 entry

X       Y       Z
1        A       B
1        A       C
1        G       I
2        R       D
3        A       E
3        A       O

To

X       Y       Z
1        A       B
1        A       C
1        G       I
3        A       E
3        A       O

FJCC · June 3, 2020, 6:14pm

Here is one solution. I stored your data in a file named Dummy.csv

library(dplyr)
DF <- read.csv("~/R/Play/Dummy.csv", stringsAsFactors = FALSE)
Singles <- DF %>% group_by(X) %>% 
  summarize(COUNT = n()) %>% 
  filter(COUNT == 1)
Singles
#> # A tibble: 1 x 2
#>       X COUNT
#>   <int> <int>
#> 1     2     1
DFnew <- anti_join(DF, Singles, by = "X")
DFnew
#>   X Y Z
#> 1 1 A B
#> 2 1 A C
#> 3 1 G I
#> 4 3 A E
#> 5 3 A O

^{Created on 2020-06-03 by the reprex package (v0.3.0)}

mfherman · June 3, 2020, 6:45pm

A related approach to @FJCC's answer is the following:

library(tidyverse)

df <- tribble(
  ~X, ~Y, ~Z,
  1, "A", "B",
  1, "A", "C",
  1, "G", "I",
  2, "R", "D",
  3, "A", "E",
  3, "A", "O"
  )  

df %>% 
  group_by(X) %>% 
  filter(n() > 1) %>% 
  ungroup()
#> # A tibble: 5 x 3
#>       X Y     Z    
#>   <dbl> <chr> <chr>
#> 1     1 A     B    
#> 2     1 A     C    
#> 3     1 G     I    
#> 4     3 A     E    
#> 5     3 A     O

^{Created on 2020-06-03 by the reprex package (v0.3.0)}

system · June 24, 2020, 6:52pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.