Filter out > 45% missing values/ zero values

Hello,

I am trying to filter out peptides with more than 45% zero values. I already performed imputations for NA values, but now want to subset data with < 45% "zeros".

Here is an example of what my data looks like (P on the left stands for participant).

#> Peptide 1 Peptide 2 Peptide 3 Peptide 4 Peptide 5
#> P1 0.06637717 0.22459742 0.05149364 0.12442481 0.23367631
#> P2 0.09303097 0.23616882 0.04413919 0.17940463 0.05303563
#> P3 0. 0.16519945 0.17175571 0.24797652 0.16291844
#> P4 0 0.15727851 0.09602593 0.09500879 0.03138877
#> P5 0 0.01544657 0.19246035 0.19436131 0.06680517

Here is a way to do it using functions from base R.

#Make some data
DF <- data.frame(Subject = paste0("P", 1:10), stringsAsFactors = FALSE)
DataMat <- matrix(sample(0:1, size = 100, replace = TRUE), nrow = 10)
colnames(DataMat) <- paste0("Peptide", 1:10)
DF <- cbind(DF, as.data.frame(DataMat))
DF
#>    Subject Peptide1 Peptide2 Peptide3 Peptide4 Peptide5 Peptide6 Peptide7
#> 1       P1        1        0        0        1        1        1        0
#> 2       P2        1        0        0        0        1        1        0
#> 3       P3        0        1        1        0        1        1        0
#> 4       P4        1        0        0        1        0        0        0
#> 5       P5        1        1        0        1        0        0        1
#> 6       P6        1        1        1        1        1        1        1
#> 7       P7        1        0        1        0        1        0        0
#> 8       P8        0        0        1        1        1        1        0
#> 9       P9        0        0        0        0        1        0        1
#> 10     P10        0        1        0        0        0        0        1
#>    Peptide8 Peptide9 Peptide10
#> 1         1        0         1
#> 2         1        1         1
#> 3         0        1         1
#> 4         1        1         1
#> 5         0        1         1
#> 6         1        1         1
#> 7         0        0         0
#> 8         1        1         1
#> 9         1        0         0
#> 10        1        1         1

#COunt the zeros in each row
CountZeros <- function(x) sum(x == 0)
Zeros <- apply(X = DF[, 2:11], 1, CountZeros)
Zeros
#>  [1] 4 4 4 5 4 0 7 3 7 5
#Filter the rows with fewer than 5 zeros
DF_filtered <- DF[Zeros < 5, ]

Created on 2020-05-07 by the reprex package (v0.3.0)

Thank you very much for your help.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.