Hi,
I am working with a R dataframe with rownames as features (Gene IDs) and samples as columns. I am interested in counting the numbers in each column with some cut-off. For instance as shown below with small data to consider counting value >= 1
across all columns. In addition to this cut-off, I want to count for below cut-off Then, print all the values passing cut-off values in one table. It will be of help to me plot a grouped barplot for comparison.
value >= 5
value >= 10
value >= 50
.
Is there a way to do this via any data manipulation packages like dplyr
or tidyverse
. For now I can think of repeating the same steps for all cut-off values one by one. Can this be handled in simple steps?
Thank you,
Toufiq
library(tidyverse)
## Input data
dput(Data)
structure(list(S1 = c(0L, 0L, 0L, 0L, 0L, 0L, 11L, 15L, 19L,
0L, 100L, 50L, 10L, 100L, 50L), S2 = c(0L, 0L, 2L, 3L, 4L, 0L,
12L, 16L, 20L, 23L, 1000L, 50L, 10L, 50L, 50L), S3 = c(1L, 0L,
9L, 0L, 0L, 0L, 13L, 17L, 21L, 0L, 100000L, 40L, 10L, 100000L,
50L), S4 = c(1L, 0L, 9L, 0L, 0L, 0L, 14L, 18L, 22L, 0L, 22L,
60L, 10L, 0L, 100000L)), class = "data.frame", row.names = c("Gene_1",
"Gene_2", "Gene_3", "Gene_4", "Gene_5", "Gene_6", "Gene_7", "Gene_8",
"Gene_9", "Gene_10", "Gene_11", "Gene_12", "Gene_13", "Gene_14",
"Gene_15"))
#> S1 S2 S3 S4
#> Gene_1 0 0 1 1
#> Gene_2 0 0 0 0
#> Gene_3 0 2 9 9
#> Gene_4 0 3 0 0
#> Gene_5 0 4 0 0
#> Gene_6 0 0 0 0
#> Gene_7 11 12 13 14
#> Gene_8 15 16 17 18
#> Gene_9 19 20 21 22
#> Gene_10 0 23 0 0
#> Gene_11 100 1000 100000 22
#> Gene_12 50 50 40 60
#> Gene_13 10 10 10 10
#> Gene_14 100 50 100000 0
#> Gene_15 50 50 50 100000
## Count
Data.v1 <-
Data %>%
gather(x, value, S1:S4) %>%
group_by(x) %>%
tally(value >= 1)
dput(Data.v1)
structure(list(x = c("S1", "S2", "S3", "S4"), n = c(8L, 12L,
10L, 9L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-4L))
#> # A tibble: 4 × 2
#> x n
#> <chr> <int>
#> 1 S1 8
#> 2 S2 12
#> 3 S3 10
#> 4 S4 9
Created on 2023-02-09 with reprex v2.0.2