h2o ensemble models comparison

Hi,

I used h2o RF, and GBM as base models for my stacked ensemble model, and got the results below as you can see. I used stacked ensemble model for different datasets and got similar results for stacked model with a gap between 6-10 % between training and testing r-squared. Has anyone encountered the same issue? Also, would you consider the model to be flawed with 10 % gap in r-squared between training and testing score? Is that overfitting?

> rfmodel = h2o.randomForest( 1:8,9,training_frame = train, validation_frame = valid,ntrees= 500,nfolds=6, seed = 1, keep_cross_validation_predictions = TRUE)
  |===========================================================================================================| 100%
> gbmmodel= h2o.gbm(1:8,9,training_frame = train, validation_frame = valid, ntrees= 1000, nfolds= 6, seed = 1,distribution = "gaussian",keep_cross_validation_predictions = TRUE)
  |===========================================================================================================| 100%
> ensemble <- h2o.stackedEnsemble(1:8,9,training_frame = train, validation_frame = valid,
+                                 base_models = list(rfmodel, gbmmodel))
  |===========================================================================================================| 100%
> h2o.r2(gbmmodel, xval = TRUE,valid = TRUE)
    valid      xval 
0.9321566 0.9260021 
> h2o.r2(rfmodel, train = TRUE,valid = TRUE)
    train     valid 
0.8985851 0.8951112 
> h2o.r2(ensemble, train = TRUE,valid = TRUE)
    train     valid 
0.9826646 0.9040009 

> h2o.rmse(ensemble, train = TRUE,valid = TRUE)
   train    valid 
1.429667 4.257925

Hello guys, can anyone help?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.