I was asked such a question recently from one of our partners. They apply a very traditional approach of statistical modelling where they build a logistic regression model, arrive at the final form by using stepwise (that was shocking for me - who even does that?), make sure all the p-values are met, along with univariate Ginis and check of course the linearity of log-odds (as well as a couple of other assumptions).

As a response I proposed that a different linear algorithm could be used such as: lasso, ridge or simply elastic-net to arrive at a more powerful model with good generalization properties. As I was commonly used to by reading all different materials on the topic, by applying such a shrinkage algorithm we do not need to worry so much about some of the points mention before, because the embedded feature selection aspect of the model takes care of that to a great extent.

However, I was taken by surprise a bit when the partner asked me to compare both models statistical properties and e.g. verifying the linearity of log-odds of the shrinkage model. Would that actually even make sense? Today's goal orientated predictive modelling aims at maximizing predictive performance, of course with a proper train, CV, test scheme, and is not bothered so much about meeting theoretical assumptions.

What is your take here?