Delete rows based on multiple conditions

Hi, I want to delete (filter) rows based on multiple conditions. The following accomplishes what I want, but there must be a simpler way using the exclude vector:

set.seed(999)
df <- matrix(sample(1:50, 100, replace=T), nrow=10)
df
exclude <- c(27, 29, 54, 88)
df[(df==27 | df==29 | df==54 | df==88)] <- NA
df
df <- df[complete.cases(df), ]
df

 ___[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 27 15 12 2 4 2 44 39 33 3
[2,] 4 5 9 9 41 45 29 23 29 33
[3,] 7 7 31 33 7 30 19 1 45 32
[4,] 41 38 37 27 34 12 20 37 36 26
[5,] 14 19 4 50 13 13 39 34 31 45
[6,] 1 43 15 24 25 44 9 27 46 47
[7,] 10 18 31 40 2 31 4 40 18 13
[8,] 22 1 16 26 13 2 31 34 47 35
[9,] 35 37 29 4 39 27 41 6 16 4
[10,] 23 19 43 28 23 17 20 15 22 8

  ___[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] NA 15 12 2 4 2 44 39 33 3
[2,] 4 5 9 9 41 45 NA 23 NA 33
[3,] 7 7 31 33 7 30 19 1 45 32
[4,] 41 38 37 NA 34 12 20 37 36 26
[5,] 14 19 4 50 13 13 39 34 31 45
[6,] 1 43 15 24 25 44 9 NA 46 47
[7,] 10 18 31 40 2 31 4 40 18 13
[8,] 22 1 16 26 13 2 31 34 47 35
[9,] 35 37 NA 4 39 NA 41 6 16 4
[10,] 23 19 43 28 23 17 20 15 22 8

___ [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 7 7 31 33 7 30 19 1 45 32
[2,] 14 19 4 50 13 13 39 34 31 45
[3,] 10 18 31 40 2 31 4 40 18 13
[4,] 22 1 16 26 13 2 31 34 47 35
[5,] 23 19 43 28 23 17 20 15 22 8

set.seed(999)
DF <- matrix(sample(1:50, 100, replace=T), nrow=10)
DF
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]   27   15   12    2    4    2   44   39   33     3
#>  [2,]    4    5    9    9   41   45   29   23   29    33
#>  [3,]    7    7   31   33    7   30   19    1   45    32
#>  [4,]   41   38   37   27   34   12   20   37   36    26
#>  [5,]   14   19    4   50   13   13   39   34   31    45
#>  [6,]    1   43   15   24   25   44    9   27   46    47
#>  [7,]   10   18   31   40    2   31    4   40   18    13
#>  [8,]   22    1   16   26   13    2   31   34   47    35
#>  [9,]   35   37   29    4   39   27   41    6   16     4
#> [10,]   23   19   43   28   23   17   20   15   22     8
exclude <- c(27, 29, 54, 88)
censor <- function(x) ifelse(x %in% exclude,NA,x)
apply(DF,2,censor)
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]   NA   15   12    2    4    2   44   39   33     3
#>  [2,]    4    5    9    9   41   45   NA   23   NA    33
#>  [3,]    7    7   31   33    7   30   19    1   45    32
#>  [4,]   41   38   37   NA   34   12   20   37   36    26
#>  [5,]   14   19    4   50   13   13   39   34   31    45
#>  [6,]    1   43   15   24   25   44    9   NA   46    47
#>  [7,]   10   18   31   40    2   31    4   40   18    13
#>  [8,]   22    1   16   26   13    2   31   34   47    35
#>  [9,]   35   37   NA    4   39   NA   41    6   16     4
#> [10,]   23   19   43   28   23   17   20   15   22     8
1 Like

replace this one line with

df[df %in% exclude] <- NA
2 Likes

Thank you to both experts.
Suppose instead I had
include <- c(16, 26,31)
by which I only want rows including all of them.
Is there a better way than
df <- df[(df==16 & df==26 & df==31),]
which does not work?

set.seed(999)
DF <- matrix(sample(1:50, 100, replace=T), nrow=10)
DF
#>       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#>  [1,]   27   15   12    2    4    2   44   39   33     3
#>  [2,]    4    5    9    9   41   45   29   23   29    33
#>  [3,]    7    7   31   33    7   30   19    1   45    32
#>  [4,]   41   38   37   27   34   12   20   37   36    26
#>  [5,]   14   19    4   50   13   13   39   34   31    45
#>  [6,]    1   43   15   24   25   44    9   27   46    47
#>  [7,]   10   18   31   40    2   31    4   40   18    13
#>  [8,]   22    1   16   26   13    2   31   34   47    35
#>  [9,]   35   37   29    4   39   27   41    6   16     4
#> [10,]   23   19   43   28   23   17   20   15   22     8
include <- c(16, 26,31)
m <- matrix(DF %in% include, nrow = 10)
sumof_include <- sum(include)
DF[which(rowSums(m * DF) >= sumof_include),]
#>  [1] 22  1 16 26 13  2 31 34 47 35

Explication: f(x)=y where DF = x, y is a subset of x that satisfies the condition that each row contains the elements in `c(16,26,31) and f is to be composed.

m <- matrix(DF %in% include, nrow = 10)

provides a logical subset of DF that contains only the values in include. By inspection with

m*DF

not all rows contain all elements of include, which can be confirmed with

rowSums(m*DF)

Under the naive assumption that each element of include appear only once

rowSums(m*DF)

will include sum(include), equal to 73.

Finally DF can be subset to show only the rows that meet the condition of containing each element in include at least once.

Not handled is the case when a row of DF might contain multiple occurrences of only one or two of the elements of include such that its rowSum would exceed 73, notwithstanding lack of one element.

That is a reprex for a different time.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.