What are the negatives of the lasso model?


#1

I’ve been working on trying to build a predictive model. I have 26 predictor variables (all continuous) for one response (n=52 in training set, also continuous). I think linear regression is the way to go and have read the relevant chapters of Applied Predictive Modeling and Chapter 6 of An Introduction to Statistical Learning (Linear Model Selection and Regularization).

Both books start simple and then introduce new possible models to use. As I read, each model sounds better than that last until they get to the lasso model, which sounds like exactly what I want. It will basically choose the necessary predictors variables (and get rid of the rest?) and build a relatively accurate model? While the other models mentioned sound good on some points, there is always a trade off.

Should I commit to learning the code and follow through with this? Or is lasso too good to be true?


#2

In my experience, it’s always a trade off.
In the case of lasso, it’s not unbiased. It errs on the side of smaller coefficients though which in many cases can be viewed as a good thing (more conservative).
Definitely try it (for help with variable selection in other models if nothing else), but its always good to try out a few models and compare how results differ, especially when the power is low (with n=52, not unlikely)


#3

The LASSO is definitely a great model. It runs efficiently and, as you’ve stated, it does a certain amount of its own automatic variable selection. One thing to be aware of is that the variable selection can be somewhat random, particularly if some of your predictors are highly correlated. Don’t let the whims of whatever optimization algorithm you’re using and a random seed replace your own domain knowledge and due diligence in doing variable selection.