Does presence of "0" in data matrix causes wrong output in spearmann's correlation calculation?

Hi community!!!
I have a table showing relative abundance of many different bacteria. I wanted to create a correlation (Spearman) matrix with significance level (P-value) from this table using "Hmisc" package. I have imported the table as .xlsx format and created a data frame from it. After that I have created the correlation matrix using the following codes:

res2 <- rcorr(my_data, type = c("spearman"))

But in my input data table, there are many bacteria with zero (0) relative abundance. I have keep them as it is (i.e. 0). Should this result into a wrong calculation and wrong output?

Here's a short portion from my data table:

clade_name ERR1_profile ERR2_profile ERR3_profile ERR4_profile ERR5_profile ERR6_profile
Actinobaculum_sp_oral_taxon_183 0 0 0 0 0 0
Actinomyces_graevenitzii 0 0 0 0 0 0
Actinomyces_naeslundii 0 0 0 0 0.00269 0
Actinomyces_odontolyticus 0 0 0.00341 0 0.03155 0
Actinomyces_oris 0 0 0 0.00155 0.00186 0
Actinomyces_sp_HMSC035G02 0 0 0 0 0.0066 0
Actinomyces_sp_HPA0247 0 0 0 0 0 0
Actinomyces_sp_ICM47 0 0 0.0042 0 0 0
Actinomyces_sp_S6_Spd3 0 0 0 0 0 0
Actinomyces_sp_oral_taxon_181 0 0 0 0 0 0
Actinomyces_sp_oral_taxon_414 0 0 0 0 0 0
Actinomyces_turicensis 0 0 0 0 0 0
Varibaculum_cambriense 0 0 0 0 0 0
Aeriscardovia_aeriphila 0.00454 0 0 0.00593 0.00257 0
Alloscardovia_omnicolens 0 0 0 0 0 0

Thanks,
dc7

rcorr will run correctly with 0 entries

library(Hmisc)
#> Loading required package: lattice
#> Loading required package: survival
#> Loading required package: Formula
#> Loading required package: ggplot2
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
x <- c(-2, -1, 0, 1, 2)
y <- c(4,   1, 0, 1, 4)
z <- c(1,   2, 3, 4, NA)
v <- c(1,   2, 3, 4, 5)
rcorr(cbind(x,y,z,v))
#>   x     y     z v
#> x 1  0.00  1.00 1
#> y 0  1.00 -0.75 0
#> z 1 -0.75  1.00 1
#> v 1  0.00  1.00 1
#> 
#> n
#>   x y z v
#> x 5 5 4 5
#> y 5 5 4 5
#> z 4 4 4 4
#> v 5 5 4 5
#> 
#> P
#>   x      y      z      v     
#> x        1.0000 0.0000 0.0000
#> y 1.0000        0.2546 1.0000
#> z 0.0000 0.2546        0.0000
#> v 0.0000 1.0000 0.0000

Created on 2020-08-11 by the reprex package (v0.3.0)

It has no way of knowing, however, if 0 correctly encodes an observation of 0 or a non-observation, which should always be NA.

library(Hmisc)
#> Loading required package: lattice
#> Loading required package: survival
#> Loading required package: Formula
#> Loading required package: ggplot2
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
x <- c(-2, -1, NA, 1, 2)
y <- c(4,   1, NA, 1, 4)
z <- c(1,   2, 3, 4, NA)
v <- c(1,   2, 3, 4, 5)
rcorr(cbind(x,y,z,v))
#>   x     y     z v
#> x 1  0.00  1.00 1
#> y 0  1.00 -0.76 0
#> z 1 -0.76  1.00 1
#> v 1  0.00  1.00 1
#> 
#> n
#>   x y z v
#> x 4 4 3 4
#> y 4 4 3 4
#> z 3 3 4 4
#> v 4 4 4 5
#> 
#> P
#>   x      y      z      v     
#> x        1.0000 0.0000 0.0000
#> y 1.0000        0.4544 1.0000
#> z 0.0000 0.4544        0.0000
#> v 0.0000 1.0000 0.0000

Created on 2020-08-11 by the reprex package (v0.3.0)

See also, help(rcorr)

rcorr returns a list with elements r, the matrix of correlations, n the matrix of number of observations used in analyzing each pair of variables, and P, the asymptotic P-values. Pairs with fewer than 2 non-missing values have the r values set to NA. The diagonals of n are the number of non-NAs for the single variable corresponding to that row and column.

Sorry, I don;t understand what do you mean by an observation of 0 and non-observation. Also, I am seeing the result in two cases are different.

Thanks

Observation: Counted the number of occurrences, they were 0.
Non-observation: Didn't count the number of occurrences, but coded 0 anyway.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.