I am trying since a while now to find a way to create a correlation matrix of all variables of my linear lm model. So far I have only found a way to do it with two variables . For the lmer model i got it automatically if i put summary(model) but for my lm model it doesnt give it ...
I had non-numeric variables in my lmer model and the summary (lm) would should me the correlation matrix also of non-numerical variables. Not sure how it is done, but it calculated it somehow.
(1) Correlation of non-numeric variables doesn't make any sense.
(2) When I ran the code you posted, it ran fine with no error messages and no non-numeric variables. It's likely the problem is in something else you're doing. Try rm(list=ls()) and then re-running.
and then summary(model) it gives me a nice correlation matrix for all variables as in my example above.
Now with a "basic" linear regression model
model <-lm(Expression ~ Batch + AGE.Group + Sample.Site +Gender ,data=df) it doesnt give me a correlation matrix with summary(model) and I just cant find a way to get it since I have non-numeric, categorical variables.
I believe that what you are seeing is the correlation matrix of the estimated coefficients, no the correlation of the variables. I could be wrong though.
@startz I just checked again what it says under summary of my lmer model.
It says "Correlation of Fixed Effects". I thought that meant the correlation of the variables?!
I believe it is the correlation of the coefficients, though I'm not sure. Here's what the documentation says,
correlation
(logical) for vcov, indicates whether the correlation matrix as well as the variance-covariance matrix is desired; for summary.merMod, indicates whether the correlation matrix should be computed and stored along with the covariance; for print.summary.merMod, indicates whether the correlation matrix of the fixed-effects parameters should be printed. In the latter case, when NULL (the default), the correlation matrix is printed when it has been computed by summary(.), and when p <= 20.
Non-numeric variables don't have correlations. But the dummy variables used to estimate fixed effects are numeric. If you have a variables that only takes on two values, then you get one dummy variable for each and the correlations of those dummy variables can be checked. If your variables take on more than two values you get more than one dummy for each. The correlations of the dummies can be calculated, but one would have to be very careful thinking about their meanings.
Thank you @startz
Makes sense with the dummy variables that i cannot calculate the correlation of the non-numeric but that i can calculate it when i have dummy variables.
I am not quite sure what @nirgrahamuk solution does, it is somehow converting non-numeric to numeric variables but these are not dummy variables?!
Maybe to solve my issue i need to check how to create dummy variables in R and then i should be able to run this as posted by @nirgrahamuk
Thank you. Nice package.
I run it unfortunately it seems not to work for me because i need to put as a target my gene expression which is not categorical. And it looks like it wants to have one feature of a categorical variable as a target.