# Correlation between a nominal category (binary) and a continuous numerical one

Hi! I have a question regarding statistics. I have a data set of 13 metabolites (numerical variable) and then infant's characteristics like weight or gender. For the weight, I have performed a Pearson r correlation, but I have read it that for nominal categories it is not possible. Does anyone knows which test would fit for assessing the correlation between a binary category and a numerical one?

Thank you so much!

Hi there!

I would think about a point-biserial correlation coefficient. It measures the strength and direction of the relationship between a binary variable and a continuous variable. I am not sure if this is what you are searching for but it was my first guess. Here an example how to calculate in R with a random dataset I created and just one variable of metabolities:

#create random dataset
set.seed(123)
n <- 100

weight <- rnorm(n, mean = 45, sd = 15)
gender <- sample(c("female", "male"), n, replace = TRUE)
age <- sample(8:16, n, replace = TRUE)

Metabolities <- sample(10:50, n, replace = TRUE)

#create data frame
dataset <- data.frame(weight, gender, age, Metabolities)

#pearson correlation
correlation_pearson <- cor(Metabolities, weight)
print(correlation_pearson)

# Convert the binary variable to a factor

gender <- factor(gender)

# first continuous variable, second binary variable

correlation_biserial <- cor(M1, as.numeric(gender))
print(correlation_biserial)

1 Like
``````set.seed(42)

binary = sample(0:1,1e4,replace = TRUE)
continuous = rnorm(1e4)

cor.test(binary,continuous, method = "pearson", use = "pairwise.complete.obs")
#>
#>  Pearson's product-moment correlation
#>
#> data:  binary and continuous
#> t = 1.8129, df = 9998, p-value = 0.06988
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#>  -0.001472674  0.037714589
#> sample estimates:
#>        cor
#> 0.01812792

``````

Created on 2023-05-16 with reprex v2.0.2

Thank you so much to all! However, I have a question. I have performed the test with the function correlation_biserial <- cor(M1, as.numeric(gender)) and with the function of technocrat of cor.test. The results are the same and I can not see where I have said with the function cor.test that I want a biseral correlation:

# Variables

variables <- c("FL_2", "FL_3", "SL_3", "SL_6", "DSLNT", "LDFT", "LNDFH_I", "LNDFH_II", "LNFP_V", "LNT", "LNnFP_V", "LNnT", "Lactose")

# Diabetes- binary variable

Diabetes <- Conce_HM_\$Diabetes

# Loop

for (var in variables) {
correlation_pearson <- cor.test(Diabetes, Conce_HM_[[var]], method = "pearson", use = "pairwise.complete.obs")

# Extract correlation coefficient and p-value

``````correlation_coefficient <- correlation_pearson\$estimate
p_value <- correlation_pearson\$p.value
``````

# Print the results

``````cat("Pearson's correlation between Diabetes and", var, ":\n")
cat("Correlation coefficient:", correlation_coefficient, "\n")
cat("p-value:", p_value, "\n\n")
``````

}

I only have told that I want a Pearson correlation. I am a bit confused.

Pearson's and the point biserial are mathematically equivalent. The latter is preferred when the dichotomous term has been induced, rather than natural. Although, until recently, `gender` is natural, point biserial is not needed. However, were gender conceptualized in the modern sense of a social construct, dichotomous treatment would not be appropriate. I have removed the offending example from my answer.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.