Hi,
I am trying to perform sum
of rows using a coulumns in R dataframe. But, I see incorrect values while performing sum with NA values in R. For instance, please see the computation of Gene_A
and Gene_E
rows. It seems like the values are not summed properly as it should.
Please assist.
Thank you,
Toufiq
dput(Transcript_data)
structure(list(Gene_Symbol = c("Gene_A", "Gene_A", "Gene_A",
"Gene_A", "Gene_C", "Gene_C", "Gene_D", "Gene_E", "Gene_E", "Gene_E"
), Sample_1 = c(3L, 0L, NA, NA, 28L, 6L, 310L, 2L, 21L, NA),
Sample_2 = c(1L, 0L, 26L, NA, 25L, 8L, 177L, 4L, 15L, 26L
), Sample_3 = c(1L, 0L, 43L, NA, 24L, 5L, 246L, 17L, 17L,
NA), Sample_4 = c(1L, 0L, NA, NA, 27L, 7L, 231L, 6L, 9L,
47L), Sample_5 = c(0L, 0L, NA, NA, 24L, 6L, 188L, 4L, 14L,
28L)), class = "data.frame", row.names = c(NA, -10L))
#> Gene_Symbol Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
#> 1 Gene_A 3 1 1 1 0
#> 2 Gene_A 0 0 0 0 0
#> 3 Gene_A NA 26 43 NA NA
#> 4 Gene_A NA NA NA NA NA
#> 5 Gene_C 28 25 24 27 24
#> 6 Gene_C 6 8 5 7 6
#> 7 Gene_D 310 177 246 231 188
#> 8 Gene_E 2 4 17 6 4
#> 9 Gene_E 21 15 17 9 14
#> 10 Gene_E NA 26 NA 47 28
dput(Transcript_data_Sum)
structure(list(Gene_Symbol = c("Gene_A", "Gene_C", "Gene_D",
"Gene_E"), Sample_1 = c(3L, 34L, 310L, 23L), Sample_2 = c(1L,
33L, 177L, 19L), Sample_3 = c(1L, 29L, 246L, 34L), Sample_4 = c(1L,
34L, 231L, 15L), Sample_5 = c(0L, 30L, 188L, 18L)), row.names = c(NA,
-4L), class = "data.frame")
#> Gene_Symbol Sample_1 Sample_2 Sample_3 Sample_4 Sample_5
#> 1 Gene_A 3 1 1 1 0
#> 2 Gene_C 34 33 29 34 30
#> 3 Gene_D 310 177 246 231 188
#> 4 Gene_E 23 19 34 15 18
To sum based on the `Gene_Symbol` column
`Transcript_data_Sum <- aggregate(. ~ Gene_Symbol, Transcript_data, sum)`
Created on 2022-12-02 with reprex v2.0.2