Elastic net model fitting with glmnet & caret produces worse fit when more data is provided?

Grobi · September 9, 2021, 9:50am

Hello Community,

I am using glmnet and caret to iteratively fit forecast models in a time series context. See my code below.

alpha<- c(0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1)
lambda <- seq(0, 300, length = 30)

elastic <- train(
  var ~., data = t, method = "glmnet",
  trControl = trainControl("timeslice", initialWindow = 30, horizon = 3, fixedWindow = FALSE),
  metric="RMSE", tuneGrid=expand.grid(alpha=alpha, lambda=lambda)
)

I am fitting the model on 45 data points and use timeslice for validation (I know the number of iterations should be higher but there is simply no more data). The problem that I face is: I am iteratively fitting multiple models, for each model I add another block of variables to see whether this block of variables can improve the model. However, when fitting a model with 35 candidate variables, the model fit suddenly becomes noticeably worse in terms of both RMSE and R squared (and not only marginally: RMSE from 2000 to 2500, R squared from 0.88 to 0.85).

The previously fitted model only included a set of 13 independent variables, lambda was 0 (so basically just a linear regression). The model with the worse fit has a set of 35 independent variables which includes all 13 variables from the previous model, a lambda of 41 and an alpha of 1 (lasso regression).

For model fitting I test lambda values from 0 to 300 and alpha values from 0 to 1 in 0.1 steps.
Unfortunately, I cannot post my data due to policy restrictions.

My question: How is it possible that R squared and the general fit becomes worse with more variables fed to the model? My intuition is that the model should be at least as good as the previous model, given that all 13 variables from that model are also available for this model. Is this possibly an issue with the packages / algorithms or am I doing something wrong?

Thanks a lot for any help in advance!

system · September 30, 2021, 9:51am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.