Thank you in advance for the read. I have been working with a data set at work using xgboost (via caret), setting my seed for reproducibility and tuning the parameters. When I use expand.grid I am able to get a higher accuracy on the model (and better prediction of my test set) than when I use the same parameters (found via model$results$besttune) in expand.grid without any sequence. I've done my best to generate a reproducible example but am having a hard time doing so. This leads me to think that it may be because my model is overfit. Please note that in my real-world model, I've shrunk the expand.grid to a more optimized size (in case that is someone's suggestion). I've also removed the seed to see how stable the model accuracy is and it is definitely quite variable (76% on the test set is the highest I've seen and 6 other models give 61%-73%)

Any ideas on why this is? In my real world work, the accuracy goes from 76% on the test set down to about 71% on the test set with this one change. Test set is 20% of the data (n = 167)

In case it helps, the grid search is:

max_depth = c(3, 4, 5),

nrounds = seq(from = 25, to = 95, by = 10),

eta = c(0.025, 0.05, 0.1),

gamma = 0,

colsample_bytree = c(0.6,0.8),

min_child_weight = 1,

subsample = 1

The best tune is:

max_depth = 3,

nrounds = 65,

eta = 0.1,

gamma = 0,

colsample_bytree = 0.6,

min_child_weight = 1,

subsample = 1

Since I can't come up with a reprex that actually works (I tried three different times and got stable results), I am asking this in a more theory sense rather than "how do I make this code work."

For discussions related to modeling, machine learning and deep learning. Related packages include `caret`

, `modelr`

, `yardstick`

, `rsample`

, `parsnip`

, `tensorflow`

, `keras`

, `cloudml`

, and `tfestimators`

.