Correlation between a nominal category (binary) and a continuous numerical one

mendigorri · May 16, 2023, 4:35pm

Hi! I have a question regarding statistics. I have a data set of 13 metabolites (numerical variable) and then infant's characteristics like weight or gender. For the weight, I have performed a Pearson r correlation, but I have read it that for nominal categories it is not possible. Does anyone knows which test would fit for assessing the correlation between a binary category and a numerical one?

Thank you so much!

Hannahhere · May 16, 2023, 8:53pm

Hi there!

I would think about a point-biserial correlation coefficient. It measures the strength and direction of the relationship between a binary variable and a continuous variable. I am not sure if this is what you are searching for but it was my first guess. Here an example how to calculate in R with a random dataset I created and just one variable of metabolities:

#create random dataset
set.seed(123)
n <- 100

weight <- rnorm(n, mean = 45, sd = 15)
gender <- sample(c("female", "male"), n, replace = TRUE)
age <- sample(8:16, n, replace = TRUE)

Metabolities <- sample(10:50, n, replace = TRUE)

#create data frame
dataset <- data.frame(weight, gender, age, Metabolities)

#pearson correlation
correlation_pearson <- cor(Metabolities, weight)
print(correlation_pearson)

Convert the binary variable to a factor

gender <- factor(gender)

Calculate point-biserial correlation

first continuous variable, second binary variable

correlation_biserial <- cor(M1, as.numeric(gender))
print(correlation_biserial)

Hope this will help you.

technocrat · May 16, 2023, 9:00pm

set.seed(42)

binary = sample(0:1,1e4,replace = TRUE)
continuous = rnorm(1e4) 

cor.test(binary,continuous, method = "pearson", use = "pairwise.complete.obs")
#> 
#>  Pearson's product-moment correlation
#> 
#> data:  binary and continuous
#> t = 1.8129, df = 9998, p-value = 0.06988
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.001472674  0.037714589
#> sample estimates:
#>        cor 
#> 0.01812792

^{Created on 2023-05-16 with reprex v2.0.2}

mendigorri · May 17, 2023, 5:21pm

Thank you so much to all! However, I have a question. I have performed the test with the function correlation_biserial <- cor(M1, as.numeric(gender)) and with the function of technocrat of cor.test. The results are the same and I can not see where I have said with the function cor.test that I want a biseral correlation:

Variables

variables <- c("FL_2", "FL_3", "SL_3", "SL_6", "DSLNT", "LDFT", "LNDFH_I", "LNDFH_II", "LNFP_V", "LNT", "LNnFP_V", "LNnT", "Lactose")

Diabetes- binary variable

Diabetes <- Conce_HM_$Diabetes

Loop

for (var in variables) {
correlation_pearson <- cor.test(Diabetes, Conce_HM_[[var]], method = "pearson", use = "pairwise.complete.obs")

Extract correlation coefficient and p-value

correlation_coefficient <- correlation_pearson$estimate
p_value <- correlation_pearson$p.value

Print the results

cat("Pearson's correlation between Diabetes and", var, ":\n")
cat("Correlation coefficient:", correlation_coefficient, "\n")
cat("p-value:", p_value, "\n\n")

}

I only have told that I want a Pearson correlation. I am a bit confused.

technocrat · May 17, 2023, 7:27pm

Pearson's and the point biserial are mathematically equivalent. The latter is preferred when the dichotomous term has been induced, rather than natural. Although, until recently, gender is natural, point biserial is not needed. However, were gender conceptualized in the modern sense of a social construct, dichotomous treatment would not be appropriate. I have removed the offending example from my answer.

system · June 28, 2023, 7:28pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.