Confusion regarding glmnet standardization

I have three questions regarding standardization in the glmnet package:

  1. Is it necessary to set standardize to TRUE in glmnet if I have a data set with categorical variables only that were transformed afterwards into dummy variables?

  2. Is it necessary to set standardize to TRUE in glmnet if I have a data set with continuous variables that are all on the same scale?

  3. Which approach is more advisable when having any of these data sets, to standardize or not standardize in glmnet?

  1. If you only have indicators, then there is no need; there are already on the same scale.

  2. If they are already on the scale, there is no need to use that option.

  3. My overall advice is to make sure that all of the predictors in the model are on the same scale. So if you have both continuous predictors and indicators, I would standardize everything. You cannot over-standardize in this context and it is important that all predictors are penalized in the same way.

1 Like

Thank you so much for the answer @Max. It is exactly what I wanted. One last question if you don't mind: Whether I standardize or not, the coefficients output returned by glmnet are always on the original scale of the data prior to standardization, right?

Standardized units I believe.

1 Like

One last thing... when you use predict() you do not need to standardized before-hand if you used that option in the fit. It saves the required statistics and does the standardization internally.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.