Does it still make sense to check the linearity of log-odds for a regularized logistic regression?

konradino · March 28, 2018, 8:02pm

I was asked such a question recently from one of our partners. They apply a very traditional approach of statistical modelling where they build a logistic regression model, arrive at the final form by using stepwise (that was shocking for me - who even does that?), make sure all the p-values are met, along with univariate Ginis and check of course the linearity of log-odds (as well as a couple of other assumptions).

As a response I proposed that a different linear algorithm could be used such as: lasso, ridge or simply elastic-net to arrive at a more powerful model with good generalization properties. As I was commonly used to by reading all different materials on the topic, by applying such a shrinkage algorithm we do not need to worry so much about some of the points mention before, because the embedded feature selection aspect of the model takes care of that to a great extent.

However, I was taken by surprise a bit when the partner asked me to compare both models statistical properties and e.g. verifying the linearity of log-odds of the shrinkage model. Would that actually even make sense? Today's goal orientated predictive modelling aims at maximizing predictive performance, of course with a proper train, CV, test scheme, and is not bothered so much about meeting theoretical assumptions.

What is your take here?

andrie · April 3, 2018, 11:48am

I agree with your assessment that lasso, ridge regression or elastic net is much more appropriate. I think that the current stati

Your question reminds me of a twitter disussion, way back in 2013: https://twitter.com/hadleywickham/status/316600944475402240

In this discussion, Revolution Analytics announced support for stepwise regression, since SAS customer insisted on having stepwise in R.

In this thread, @hadley replied in no uncertain terms:

you should [have] just implemented it by randomly choosing variables b/c that's about as good as stepwise

why is this a good thing? stepwise regression is BAAAAD. Please implement modern methods that don't have awful statistical issues

You can easily find many other resource that explain why stepwise regression is bad.

For example, Andrew Gelman calls stepwise regression a joke and suggests lasso instead.
On the Stata support website there is a list of 10 reasons why stepwise regression is bad. This was compiled by Frank Harrell in 1996. Frank Harrell is famous in the R community and contributed many packages, including Hmisc
Finally, this question on StackExchange asking for advantages of stepwise regression - every answer explains why stepwise is a bad idea.

I think the consensus is that stepwise regression is fast. So if you want to get the wrong results fast, use stepwise.

If you want a useful model that generalizes well, look at lasso regression instead.

konradino · April 3, 2018, 2:19pm

Thanks @andrie, I completely agree with the statement that applying regularization methods should be the way to go - no question about it! The list of stepwise flaws is definitely very helpful to convince the partner which method is superior.

However, I'm also very convinced that the partner will be looking for a very detailed documentation of validating lasso/ ridge model properties/ requirements (similarly to what can be done for logistic regression, and mentioned in my original question). So let me rephrase my question:

Are there really any such properties/ requirements to validate apart from general model performance test?
If there are any statistical properties/ requirements, what exactly should be tested?

Max · April 3, 2018, 7:36pm

Resampling would be your best bet here. This would capture the randomness in selected variables by all of the methods and you can use it to make probabilistic statements about the model's performance as well as the differences between models.

One other thought about validation is that none of these methods (esp stepwise) have good methods of computing statistical significance of the predictors. Stepwise will give you all of the statistics but they are incorrect since they don't factor in the uncertainty of all of the models that came before. There are Bayesian analogs and these would produce posterior distributions for each predictors but, other than that, if you want statistical significance statements, you won't get it from these.

konradino · April 5, 2018, 11:25am

Thank you for your responses! I think all that should be sufficient.