Binomial Logistic Regression --> problem with multicollinearity

Hi there,
I'm new here and a little bit desperate with my code. I'd like to predict a diagnosis which is in a 0 or 1 format from 24 predictors. However, there is great multicollinearity between the predictors as all vif-values are larger than 15 (10 should be max). Is there an alternative? Also, the predictors are strongly skewed. Thanks so much in advance!

1 Like
  1. What is your goal? Predict the outcome? Explain which predictor matters?
  2. Why do you think a high vif is a problem?

Hi,
thanks!

  1. My ultimate goal is to develop the best prediction model and then calculate how good the model is (correct classifications, sensitivity, specificity). I would like to predict diagnosis (yes-no) with a logistic regression model with 24 predictors. First, all predictors in the seem significant but after after applying step()-function all predictors lose significance. This made me suspicious.
  2. I read it there: Identifying Multicollinearity in Multiple Regression.

log. Regression to predict diagnoses from all AUs

#Test for multicollinearity
library(car)
vif(logit_SIT)
AU06_c_disgust_participant AU06_c_disgust_speaker
15.77364 25.95332
AU06_c_joy_participant AU06_c_joy_speaker
29.20184 23.50569
AU06_r_disgust_participant AU06_r_disgust_speaker
63.52174 126.51286
AU06_r_joy_participant AU06_r_joy_speaker
84.45276 123.04233
AU12_c_disgust_participant AU12_c_disgust_speaker
37.49768 30.19613
AU12_c_joy_participant AU12_c_joy_speaker
102.48436 79.21014
AU12_r_disgust_participant AU12_r_disgust_speaker
59.30972 71.62611
AU12_r_joy_participant AU12_r_joy_speaker
176.53218 157.58539
AU04_c_disgust_participant AU04_c_disgust_speaker
63.48484 46.90484
AU04_c_joy_participant AU04_c_joy_speaker
131.58259 40.15185
AU04_r_disgust_participant AU04_r_disgust_speaker
43.14326 167.06319
AU04_r_joy_participant AU04_r_joy_speaker
96.72787 212.24878

Values <10 are ok. However, all values > 15

#Predict diagnoses from all AUs
model_diagnosis <- as.formula(interaction_disorder~AU06_c_disgust_participant + AU06_c_disgust_actress + AU06_c_joy_participant + AU06_c_joy_actress + AU06_r_disgust_participant + AU06_r_disgust_actress + AU06_r_joy_participant + AU06_r_joy_actress

  •                           + AU12_c_disgust_participant + AU12_c_disgust_actress + AU12_c_joy_participant + AU12_c_joy_actress + AU12_r_disgust_participant + AU12_r_disgust_actress + AU12_r_joy_participant + AU12_r_joy_actress 
    
  •                           + AU04_c_disgust_participant + AU04_c_disgust_actress + AU04_c_joy_participant + AU04_c_joy_actress + AU04_r_disgust_participant + AU04_r_disgust_actress + AU04_r_joy_participant + AU04_r_joy_actress)
    

logit_SIT <- glm(model_diagnosis, family = binomial, data = data_matched)
Warnmeldung:
glm.fit: Angepasste Wahrscheinlichkeiten mit numerischem Wert 0 oder 1 aufgetreten
summary(logit_SIT)

Call:
glm(formula = model_diagnosis, family = binomial, data = data_matched)

Deviance Residuals:
Min 1Q Median 3Q Max
-8.49 0.00 0.00 0.00 8.49

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.238e+14 1.901e+07 -11770990 <2e-16 ***
AU06_c_disgust_participant 5.531e+15 9.827e+07 56279908 <2e-16 ***
AU06_c_disgust_actress 1.252e+15 1.267e+08 9880215 <2e-16 ***
AU06_c_joy_participant -7.538e+15 1.533e+08 -49177013 <2e-16 ***
AU06_c_joy_actress -1.234e+15 1.229e+08 -10046201 <2e-16 ***
AU06_r_disgust_participant -5.754e+15 1.453e+08 -39603970 <2e-16 ***
AU06_r_disgust_actress -2.859e+15 1.634e+08 -17495666 <2e-16 ***
AU06_r_joy_participant 8.157e+15 1.944e+08 41962700 <2e-16 ***
AU06_r_joy_actress -1.505e+13 1.749e+08 -86027 <2e-16 ***
AU12_c_disgust_participant 5.192e+15 2.323e+08 22353082 <2e-16 ***
AU12_c_disgust_actress 4.827e+13 1.328e+08 363613 <2e-16 ***
AU12_c_joy_participant -3.501e+15 3.117e+08 -11231168 <2e-16 ***
AU12_c_joy_actress -4.656e+15 1.739e+08 -26770306 <2e-16 ***
AU12_r_disgust_participant 2.212e+15 1.605e+08 13785085 <2e-16 ***
AU12_r_disgust_actress 8.321e+14 1.191e+08 6988450 <2e-16 ***
AU12_r_joy_participant -4.983e+15 2.283e+08 -21823266 <2e-16 ***
AU12_r_joy_actress 3.552e+15 1.502e+08 23653687 <2e-16 ***
AU04_c_disgust_participant -1.684e+15 1.085e+08 -15518910 <2e-16 ***
AU04_c_disgust_actress 4.030e+15 1.590e+08 25337226 <2e-16 ***
AU04_c_joy_participant -3.286e+14 2.270e+08 -1447225 <2e-16 ***
AU04_c_joy_actress 8.127e+14 1.331e+08 6106961 <2e-16 ***
AU04_r_disgust_participant 7.285e+14 6.099e+07 11943527 <2e-16 ***
AU04_r_disgust_actress 1.613e+14 1.441e+08 1119680 <2e-16 ***
AU04_r_joy_participant -7.632e+14 1.165e+08 -6550339 <2e-16 ***
AU04_r_joy_actress -9.215e+14 1.291e+08 -7137298 <2e-16 ***

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance:  65.135  on 46  degrees of freedom

Residual deviance: 865.048 on 22 degrees of freedom
(25 observations deleted due to missingness)
AIC: 915.05

Number of Fisher Scoring iterations: 12

Minimizing AIC with step-function

logitMinAIC <- step(logit_SIT)
Start: AIC=915.05
interaction_disorder ~ AU06_c_disgust_participant + AU06_c_disgust_actress +
AU06_c_joy_participant + AU06_c_joy_actress + AU06_r_disgust_participant +
AU06_r_disgust_actress + AU06_r_joy_participant + AU06_r_joy_actress +
AU12_c_disgust_participant + AU12_c_disgust_actress + AU12_c_joy_participant +
AU12_c_joy_actress + AU12_r_disgust_participant + AU12_r_disgust_actress +
AU12_r_joy_participant + AU12_r_joy_actress + AU04_c_disgust_participant +
AU04_c_disgust_actress + AU04_c_joy_participant + AU04_c_joy_actress +
AU04_r_disgust_participant + AU04_r_disgust_actress + AU04_r_joy_participant +
AU04_r_joy_actress

                         Df Deviance    AIC
  • AU04_r_joy_actress 1 0.00 48.00
  • AU04_r_joy_participant 1 0.00 48.00
  • AU06_r_joy_participant 1 0.00 48.00
  • AU12_r_joy_actress 1 18.09 66.09
  • AU12_c_disgust_actress 1 22.04 70.04
  • AU12_r_disgust_participant 1 22.51 70.51
  • AU06_c_joy_actress 1 23.17 71.17
  • AU12_r_disgust_actress 1 26.15 74.15
  • AU06_c_disgust_participant 1 29.29 77.29
  • AU06_c_disgust_actress 1 576.70 624.70
  • AU04_c_joy_participant 1 576.70 624.70
  • AU06_r_disgust_participant 1 648.79 696.79
  • AU12_c_joy_actress 1 648.79 696.79
  • AU12_r_joy_participant 1 648.79 696.79
  • AU04_c_joy_actress 1 648.79 696.79
  • AU06_c_joy_participant 1 720.87 768.87
  • AU04_c_disgust_participant 1 720.87 768.87
  • AU04_r_disgust_actress 1 720.87 768.87
  • AU06_r_disgust_actress 1 792.96 840.96
  • AU12_c_disgust_participant 1 792.96 840.96
  • AU04_r_disgust_participant 1 792.96 840.96
  • AU06_r_joy_actress 1 865.05 913.05
  • AU12_c_joy_participant 1 865.05 913.05
    865.05 915.05
  • AU04_c_disgust_actress 1 937.13 985.13

Step: AIC=48
interaction_disorder ~ AU06_c_disgust_participant + AU06_c_disgust_actress +
AU06_c_joy_participant + AU06_c_joy_actress + AU06_r_disgust_participant +
AU06_r_disgust_actress + AU06_r_joy_participant + AU06_r_joy_actress +
AU12_c_disgust_participant + AU12_c_disgust_actress + AU12_c_joy_participant +
AU12_c_joy_actress + AU12_r_disgust_participant + AU12_r_disgust_actress +
AU12_r_joy_participant + AU12_r_joy_actress + AU04_c_disgust_participant +
AU04_c_disgust_actress + AU04_c_joy_participant + AU04_c_joy_actress +
AU04_r_disgust_participant + AU04_r_disgust_actress + AU04_r_joy_participant

                         Df Deviance    AIC
  • AU04_c_joy_actress 1 0.00 46.00
  • AU04_r_joy_participant 1 0.00 46.00
    0.00 48.00
  • AU04_r_disgust_actress 1 14.50 60.50
  • AU04_c_disgust_actress 1 17.07 63.07
  • AU12_c_joy_actress 1 21.57 67.57
  • AU12_r_disgust_participant 1 22.54 68.54
  • AU12_c_disgust_actress 1 23.49 69.49
  • AU12_r_disgust_actress 1 26.17 72.17
  • AU06_c_joy_actress 1 26.72 72.72
  • AU06_c_disgust_participant 1 29.68 75.68
  • AU12_c_joy_participant 1 432.52 478.52
  • AU12_r_joy_participant 1 720.87 766.87
  • AU12_r_joy_actress 1 720.87 766.87
  • AU06_c_disgust_actress 1 792.96 838.96
  • AU06_c_joy_participant 1 792.96 838.96
  • AU06_r_joy_participant 1 792.96 838.96
  • AU04_c_disgust_participant 1 792.96 838.96
  • AU04_r_disgust_participant 1 792.96 838.96
  • AU06_r_disgust_participant 1 865.05 911.05
  • AU12_c_disgust_participant 1 865.05 911.05
  • AU04_c_joy_participant 1 865.05 911.05
  • AU06_r_disgust_actress 1 937.13 983.13
  • AU06_r_joy_actress 1 937.13 983.13

Step: AIC=46
interaction_disorder ~ AU06_c_disgust_participant + AU06_c_disgust_actress +
AU06_c_joy_participant + AU06_c_joy_actress + AU06_r_disgust_participant +
AU06_r_disgust_actress + AU06_r_joy_participant + AU06_r_joy_actress +
AU12_c_disgust_participant + AU12_c_disgust_actress + AU12_c_joy_participant +
AU12_c_joy_actress + AU12_r_disgust_participant + AU12_r_disgust_actress +
AU12_r_joy_participant + AU12_r_joy_actress + AU04_c_disgust_participant +
AU04_c_disgust_actress + AU04_c_joy_participant + AU04_r_disgust_participant +
AU04_r_disgust_actress + AU04_r_joy_participant

                         Df Deviance     AIC

0.00 46.00

  • AU12_r_joy_actress 1 20.03 64.03
  • AU12_c_joy_actress 1 21.81 65.81
  • AU04_c_disgust_actress 1 22.95 66.95
  • AU12_r_disgust_participant 1 22.98 66.98
  • AU12_c_disgust_actress 1 24.56 68.56
  • AU12_r_disgust_actress 1 26.96 70.96
  • AU06_c_joy_actress 1 27.68 71.68
  • AU06_c_disgust_participant 1 32.13 76.13
  • AU06_r_joy_participant 1 720.87 764.87
  • AU12_c_disgust_participant 1 720.87 764.87
  • AU06_r_disgust_actress 1 792.96 836.96
  • AU04_c_disgust_participant 1 792.96 836.96
  • AU04_c_joy_participant 1 792.96 836.96
  • AU04_r_disgust_actress 1 792.96 836.96
  • AU06_c_disgust_actress 1 865.05 909.05
  • AU06_r_disgust_participant 1 865.05 909.05
  • AU06_r_joy_actress 1 865.05 909.05
  • AU12_r_joy_participant 1 865.05 909.05
  • AU04_r_disgust_participant 1 865.05 909.05
  • AU06_c_joy_participant 1 937.13 981.13
  • AU04_r_joy_participant 1 937.13 981.13
  • AU12_c_joy_participant 1 1009.22 1053.22
    Es gab 50 oder mehr Warnungen (Anzeige der ersten 50 mit warnings())

summary(logitMinAIC)

Call:
glm(formula = interaction_disorder ~ AU06_c_disgust_participant +
AU06_c_disgust_actress + AU06_c_joy_participant + AU06_c_joy_actress +
AU06_r_disgust_participant + AU06_r_disgust_actress + AU06_r_joy_participant +
AU06_r_joy_actress + AU12_c_disgust_participant + AU12_c_disgust_actress +
AU12_c_joy_participant + AU12_c_joy_actress + AU12_r_disgust_participant +
AU12_r_disgust_actress + AU12_r_joy_participant + AU12_r_joy_actress +
AU04_c_disgust_participant + AU04_c_disgust_actress + AU04_c_joy_participant +
AU04_r_disgust_participant + AU04_r_disgust_actress + AU04_r_joy_participant,
family = binomial, data = data_matched)

Deviance Residuals:
Min 1Q Median 3Q Max
-5.055e-04 -2.000e-08 -2.000e-08 7.396e-05 3.250e-04

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 22.8 2977.2 0.008 0.994
AU06_c_disgust_participant 1361.9 138140.2 0.010 0.992
AU06_c_disgust_actress -1904.2 364510.2 -0.005 0.996
AU06_c_joy_participant 1035.5 214640.6 0.005 0.996
AU06_c_joy_actress -5862.9 580384.2 -0.010 0.992
AU06_r_disgust_participant -202.2 98583.0 -0.002 0.998
AU06_r_disgust_actress 3093.0 337734.4 0.009 0.993
AU06_r_joy_participant -3217.7 238710.1 -0.013 0.989
AU06_r_joy_actress 2557.0 220009.5 0.012 0.991
AU12_c_disgust_participant 3878.4 292720.6 0.013 0.989
AU12_c_disgust_actress 4521.5 331383.0 0.014 0.989
AU12_c_joy_participant -5626.2 480295.5 -0.012 0.991
AU12_c_joy_actress -4333.7 314866.5 -0.014 0.989
AU12_r_disgust_participant 2556.7 181815.7 0.014 0.989
AU12_r_disgust_actress -3384.0 257473.6 -0.013 0.990
AU12_r_joy_participant -1518.6 217619.9 -0.007 0.994
AU12_r_joy_actress 2006.4 178413.0 0.011 0.991
AU04_c_disgust_participant -689.2 96610.6 -0.007 0.994
AU04_c_disgust_actress 2680.8 185407.1 0.014 0.988
AU04_c_joy_participant -1201.9 113166.2 -0.011 0.992
AU04_r_disgust_participant 397.0 75316.5 0.005 0.996
AU04_r_disgust_actress -963.8 88661.4 -0.011 0.991
AU04_r_joy_participant 250.4 33341.1 0.008 0.994

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 6.5135e+01  on 46  degrees of freedom

Residual deviance: 7.4093e-07 on 24 degrees of freedom
(25 observations deleted due to missingness)
AIC: 46

Number of Fisher Scoring iterations: 25

  1. If you want to predict the outcome, why not just use what you've already done? If they are all significant that suggests that you have enough data to overcome multicollinearity.

  2. That link does a nice job of explaining how to detect multicollinearity. But the fact that multicollinearity exists doesn't mean it's a problem. Most of the time, it isn't.

My $0.02 is that what you're doing is just fine.

Alright and thanks, Mr.Startz - makes my life easier ;).

And still:

  1. Isn't it strange that all predictors are significant at first and after excluding 2 of them with the step function, all the other 22 predictors become insignificant?
  2. These guys say that " logistic regression requires there to be little or no multicollinearity among the independent variables" (Assumptions of Logistic Regression - Statistics Solutions?cf_chl_jschl_tk=pmd_837b952b33671cea524695ae5cb584e9aaedfd1d-1628969923-0-gqNtZGzNAjijcnBszQh6)

That is unusual, although it can happen.

"These guys" are wrong.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.