Boosted regression trees

The documentation for xgboost is here:

https://xgboost.readthedocs.io/en/latest/

Why not just use basic gbm?

Note that it is a bad idea to encode your multinomial outcome as an integer unless the function specifically requires it. You did ask for a "regression tree" at first but wanted a multinomial outcome, which implies classification rather than regression. Factors are the preferred method of storing a categorical outcome in R.

# this works: 
data.gbm$Y <- factor(paste0("cls", data.gbm$Y))
fit.step <- gbm(
  Y ~ .,
  data = data.gbm,
  distribution = "multinomial",
  interaction.depth = 5,
  shrinkage = 0.005,
  bag.fraction = 1,
  n.minobsinnode = 2
)

Note the argument names are consistent with gbm too.

1 Like

For the basic gbm you provided, do you have the source codes and documentation which will make it easy for a beginner like me to learn?

Thank you Max, You have really been helpful. I will try to fit the model based on the basic gbm you provided then I will inform you whether it worked.

No problem. I can see how it would be frustrating.

There used to be a vignette for this package but it isn't con the CRAN version now. There is a pdf here. It looks fairly outdated though.

Honestly, this is exactly the kind of frustrating thing that led me to create caret.

I have managed to fit my model using the gbm.step function and gbm.simplify functions, how do I determine the predictive performance of my model?

There is the out-of-bag estimate that comes with gbm but you can also use caret to estimate this via resampling. You probably need to tune the model too.

See the caret web pages and the modeling workshop notes for general background.

1 Like