summary frequencies per column

Hi
I have a matrix with 10000 columns and 400 rows. The values in the matrix are 1, 2, or 3
I want to calculate for each column the frequencies of 1, 2 and 3

Here a smaller matrix, with only 5 columns and 7 rows- to demonstrate the setup structure

matrix(sample(1:3, 35, replace=TRUE), nrow=7, ncol=5)
#
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    1    3    3    3    1
#[2,]    3    1    3    2    2
#[3,]    2    2    3    1    2
#[4,]    3    1    3    2    3
#[5,]    2    1    2    3    3
#[6,]    2    2    2    2    1
#[7,]    3    2    3    2    3

The results should be something like this :

#      [,1] [,2] [,3] [,4] [,5]
#[1,]  1    3    0    1    2
#[2,]  3    3    2    4    2
#[3,]  3    1    5    2    3

I know table() and prop.table() functions - but thy works on single vector. I can run a loop over the columns. But, since it is a very big dataset, I prefer to avoid of using loops - if it possible.

You can use apply() to repeatedly apply table() to each column of the matrix. Converting to a factor prevents table() from leaving out values that do not appear in a particular column.

set.seed(42)

data <- matrix(sample(1:3, 35, replace = TRUE), nrow = 7, ncol = 5)

data
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    2    3    2
#> [2,]    1    3    3    1    2
#> [3,]    1    3    3    1    3
#> [4,]    1    1    1    2    3
#> [5,]    2    1    1    3    2
#> [6,]    2    2    3    2    2
#> [7,]    2    2    1    1    2

apply(data, MARGIN = 2, FUN = function(x) table(factor(x, levels = 1:3)))
#>   [,1] [,2] [,3] [,4] [,5]
#> 1    4    3    3    3    0
#> 2    3    2    1    2    5
#> 3    0    2    3    2    2

Created on 2020-06-17 by the reprex package (v0.3.0)

1 Like

I also used apply but wrote a different function to do the counting.

set.seed(1)
MAT <- matrix(sample(1:3, 35, replace=TRUE), nrow=7, ncol=5)
MAT
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    2    2    1    2
#> [2,]    3    2    2    1    2
#> [3,]    1    3    2    1    2
#> [4,]    2    3    2    1    1
#> [5,]    1    1    3    2    3
#> [6,]    3    1    1    1    1
#> [7,]    3    1    3    1    3
MyFunc <- function(vec) {
  c(sum(vec == 1), sum(vec == 2), sum(vec == 3))
}
apply(X = MAT, MARGIN = 2, FUN = MyFunc)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    3    1    6    2
#> [2,]    1    2    4    1    3
#> [3,]    3    2    2    0    2

Created on 2020-06-16 by the reprex package (v0.3.0)

Thank you for the quick response , Siddharth and FJCC :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.