I have made a multivariable logistic model in R using the glm-function.
The dependent variable is, of course, binary.
I have 10 independent variables which are dummy, categorical and numerical variables.
I have 2 questions. One regarding the conversion of the results to risk differences and one regarding multicollinearity.
1) The conversion of the results to risk differences
With the summary(glm) I get estimates in log(odds) for each variable and I can calculate odds-ratios (OR). But I am interested in reporting risks and ultimately risk differences.
I would like to do that by calculating an average baseline risk and then "manipulating" one variable at a time (e.g. smoking 0 in one calculation and 1 in another calculation) to find the risk difference if a patient smokes.
I calculate the average baseline risk by using the mean observed value of the dummy and categorical values and multiply them with their estimate and by using the median observed valued of the numerical values and multiply them with their estimate.
This is all great (I think??) and I can do it manually. But is there a faster way than doing it one at a time since I have 10 variables and it would then be a long piece of code?
I have calculated variance inflation factor (VIF) for my independent variables using the vif function from the faraway package.
Should I plot my independent variables into a linear regression (lm) and do vif(lm) or use my glm model and do vif(glm)?
And can I use VIF at all when I have those 3 different types of variables? I do get an output, so I guess it is okay then?
A reference would be great!
Wow. I hope it makes sense! Unfortunately I am not allowed to share the data with you guys. But I hope it makes a little sense.
Thanks in advance!