Elegant way to estimate and test group correlations

Jason.C · March 18, 2020, 11:56pm

Greetings,

My data have 3 group and, say, two variables. I can estimate and test the correlations for the:

overall sample
each of the 3 groups separately

What I am looking for is an elegant way to estimate and test the group correlations in one go.
For example, I gave this a try - it doesn’t work...

gp.cor <- dat2 %>%
	group_by(group.factor) %>%
	    cor.mtrx3 <-	corr.test(x = dat2[3:4],
							y = NULL,
							use = "everything",
							method = "spearman",
							adjust = "none",
							alpha = 0.05,
							ci = TRUE, minlength = 5)

Any information on resources, packages or examples that may help would be greatly appreciated.

Cheers,
Jason

joels · March 19, 2020, 2:16am

I don't know if this qualifies as elegant, but you can split the data by group and map over the resulting list. For example:

library(tidyverse)
library(psych)

gp.cor <- iris %>%
  split(.$Species) %>%  
  map(~corr.test(x = .x %>% select(Petal.Width, Sepal.Width),
                 use = "everything",
                 method = "spearman",
                 adjust = "none",
                 alpha = 0.05,
                 ci = TRUE, minlength = 5)
  )

gp.cor is a named list with the output of corr.test (which is itself a list) for each level of Species. I included .x %>% select(Petal.Width, Sepal.Width) to limit the correlation matrix to those two columns. But you can change that to just .x and get the correlation matrix for all remaining columns in the data frame (assuming they're all numeric).

Then, to extract various results, you can map over the list. For example, to get the correlation matrix:

map(gp.cor, ~.x$r)

If you want the correlations between all numeric columns and your data has columns that are not numeric, you can exclude the non-numeric columns (which you need to do to avoid an error from corr.test) as follows (where I've used the default corr.test arguments to shorten the code):

gp.cor <- iris %>%
  # Add another non-numeric column
  mutate(new.group = sample(LETTERS, 150, replace=TRUE)) %>% 
  split(.$Species) %>%  
  map(~corr.test(x = .x %>% select_if(is.numeric))
  )

Jason.C · March 19, 2020, 7:12pm

Thank you.
This is excellent, and I think helps me reach about 80% of my goals at the moment.

Now for my inevitable questions.

What is the most appropriate way to now test whether the correlations are significantly different between groups?
What is the most straightforward way to now plot the correlations by group? For example, a multi panel plot?

Thanks to any/everyone who can offer any suggestions or guidance.

Cheers,
Jason

Jason.C · April 1, 2020, 9:23pm

For some reason I cannot get this to work with my data. Using your example, different correlations are found for each species. However, in my data, the same correlations are found for all three groups and I don't know why. Any suggestions?
Jason

p.s. I have no idea why "TRUE" is a label or why it was included when I created this smaller dataset.

Here is my code

gp.cor1 <- datSM1 %>%
  split(datSM1$group.factor) %>%  
  map(~corr.test(datSM1 %>% select(ess_total, bdi_total, mcgill_total),
                 use = "pairwise",
                 method = "spearman",
                 adjust = "none",
                 alpha = 0.05,
                 ci = TRUE, minlength = 5)
  )

print(gp.cor1)

gp.cor1.r <- map(gp.cor1, ~.x$r)
gp.cor1.p <- map(gp.cor1, ~.x$p)
print(x = gp.cor1.r, short = TRUE)
print(x = gp.cor1.p, short = TRUE)

Here are my data

structure(list(group.factor = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 3L, 2L, 2L, 
2L, 3L, 2L, 3L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 2L, 2L, 
2L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 3L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 3L, 1L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 2L, 2L, 3L, 2L, 3L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 
1L, 3L, 1L, 1L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("HC", "CLBP", 
"FM", "TRUE"), class = "factor"), ess_total = structure(c(5L, 
5L, 4L, 7L, 10L, 2L, 5L, 5L, 4L, 9L, 6L, 10L, 8L, 9L, 6L, 8L, 
9L, 3L, 6L, 10L, 8L, 10L, 3L, 0L, 10L, 6L, 6L, 6L, 9L, 6L, 6L, 
4L, 3L, 5L, 3L, 6L, 6L, 5L, 1L, 1L, 3L, 11L, 3L, 8L, 8L, 4L, 
7L, 8L, 5L, 11L, 7L, 16L, 11L, 5L, 9L, 2L, 15L, 6L, 10L, 2L, 
7L, 3L, 8L, 6L, 5L, 1L, 5L, 5L, 8L, 10L, 3L, 1L, 12L, 13L, 12L, 
4L, 7L, 10L, 8L, 14L, 4L, 9L, 6L, 11L, 2L, 6L, 5L, 1L, 10L, 8L, 
2L, 10L, 7L, 6L, 7L, 11L, 9L, 9L, 3L, 3L, 6L, 5L, 2L, 9L, 7L, 
12L, 7L, 7L, 4L, 8L, 9L, 6L, 6L, 7L, 10L, 15L, 2L, 4L, 5L, 14L, 
6L, 14L, 4L, 3L, 14L, 5L, 10L, 8L, 9L, 10L, 9L, 9L, 3L, 5L, 13L, 
2L, 4L, 1L, 4L, 2L, 5L, 3L, 13L, 8L, 7L, 10L, 9L, 11L, 9L, 8L, 
10L, 8L, 12L, 6L, 14L, 3L, 6L, 18L), label = "Epworth Sleepiness Score", class = c("labelled", 
"integer")), bdi_total = structure(c(0L, 1L, 1L, 13L, 5L, 0L, 
1L, 6L, 0L, 7L, 0L, 2L, 0L, 6L, 0L, 3L, 1L, 0L, 0L, 7L, 12L, 
5L, 7L, 0L, 2L, 8L, 1L, 9L, 12L, 4L, 1L, 0L, 0L, 1L, 1L, 0L, 
25L, 0L, 0L, 13L, 1L, 5L, 2L, 7L, 1L, 4L, 22L, 6L, 12L, 2L, 5L, 
10L, 15L, 15L, 12L, 0L, 39L, 14L, 9L, 3L, 16L, 17L, 14L, 0L, 
13L, 0L, 9L, 12L, 13L, 4L, 7L, 24L, 2L, 17L, 5L, 6L, 30L, 28L, 
10L, 25L, 1L, 15L, 14L, 15L, 14L, 1L, 11L, 17L, 3L, 18L, 13L, 
29L, 13L, 12L, 17L, 19L, 1L, 7L, 22L, 0L, 13L, 8L, 4L, 3L, 12L, 
16L, 20L, 8L, 30L, 1L, 18L, 0L, 6L, 24L, 19L, 22L, 8L, 5L, 4L, 
18L, 12L, 13L, 5L, 3L, 8L, 5L, 34L, 23L, 14L, 21L, 5L, 1L, 0L, 
0L, 24L, 0L, 0L, 8L, 4L, 5L, 2L, 6L, 10L, 0L, 4L, 11L, 41L, 16L, 
15L, 1L, 29L, 29L, 16L, 14L, 12L, 13L, 10L, 35L), label = "Beck Depression Inventory Total Score", class = c("labelled", 
"integer")), mcgill_total = structure(c(0L, 0L, 0L, 0L, 0L, 0L, 
9L, 0L, 0L, 4L, 0L, 1L, 0L, 5L, 0L, 0L, 0L, 0L, 0L, 38L, 34L, 
16L, 0L, 0L, 0L, 5L, 2L, 14L, 0L, 0L, 0L, 25L, 0L, 0L, 21L, 0L, 
9L, 0L, 0L, 42L, 12L, 4L, 3L, 44L, 10L, 18L, 50L, 39L, 42L, 9L, 
21L, 7L, 15L, 9L, 18L, 21L, 50L, 0L, 21L, 2L, 18L, 26L, 18L, 
21L, 21L, 23L, 28L, 50L, 7L, 7L, 16L, 12L, 10L, 67L, 39L, 43L, 
39L, 44L, 50L, 43L, 6L, 47L, 20L, 13L, 11L, 30L, 29L, 25L, 39L, 
46L, 39L, 43L, 46L, 39L, 33L, 45L, 0L, 22L, 32L, 0L, 12L, 40L, 
7L, 21L, 56L, 38L, 42L, 40L, 43L, 16L, 44L, 16L, 30L, 40L, 37L, 
51L, 24L, 19L, 41L, 40L, 20L, 36L, 38L, 0L, 3L, 0L, 49L, 44L, 
0L, 36L, 43L, 0L, 0L, 0L, 23L, 0L, 0L, 31L, 0L, 0L, 0L, 3L, 0L, 
0L, 0L, 47L, 32L, 23L, 43L, 14L, 36L, 44L, 48L, 32L, 45L, 43L, 
32L, 50L), label = "McGill Total Score", class = c("labelled", 
"integer"))), row.names = c(NA, -158L), class = c("tbl_df", "tbl", 
"data.frame"))

system · April 22, 2020, 9:23pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.