AdaBoost in R AUC cv

mcv97 · December 16, 2022, 3:42pm

Is there a way to train an AdaBoost model using cross validation with metric AUC in tidymodels? I was thinking that maybe a specific engine from the ones that parsnip has will let me modify the loss function (to the exponential since it is the one that AdaBoost uses) and then select tree_length to be 1 in order to have stumps. If not, is there another function to cross validate adaboost using AUC?

simoncouch · December 16, 2022, 7:44pm

Thanks for the post, @mcv97!

There's no implementation of adaboost in tidymodels or extension packages, to my knowledge. We had a question about adaboost on the parsnip issue tracker a while back, and I've excerpted my response to that issue here:

We don't plan on supporting adaptive boosting in parsnip or any extension packages that we maintain, for now. We do support gradient boosted trees via C5.0, which loosely follow up on the ideas underlying Adaboost implementations, and are generally more versatile and tunable models.

Adaptive boosting is supported in caret via the "ada" engine, and that package includes three of the classic Adaboost implementations via the type argument to ada().

The caret interface would look something like:

library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice

# generate example data
train_dat <- twoClassSim()

# note that Class is a two-level factor
head(train_dat$Class)
#> [1] Class2 Class2 Class2 Class2 Class2 Class1
#> Levels: Class1 Class2

train(
  # specify variables
  x = train_dat[, 1:10], 
  y = train_dat[, "Class"], 
  # specify an adaboost model
  method = "ada",
  # optimize over area under curve
  metric = "ROC",
  # indicates 10-fold cross-validation
  trControl = trainControl(
    method = "cv", 
    number = 10, 
    classProbs = TRUE,
    summaryFunction = twoClassSummary
  )
)
#> Boosted Classification Trees 
#> 
#> 100 samples
#>  10 predictor
#>   2 classes: 'Class1', 'Class2' 
#> 
#> No pre-processing
#> Resampling: Cross-Validated (10 fold) 
#> Summary of sample sizes: 90, 89, 90, 91, 90, 90, ... 
#> Resampling results across tuning parameters:
#> 
#>   maxdepth  iter  ROC        Sens       Spec 
#>   1          50   0.8028333  0.8100000  0.560
#>   1         100   0.8130000  0.8400000  0.605
#>   1         150   0.8410000  0.8233333  0.610
#>   2          50   0.8128333  0.8433333  0.610
#>   2         100   0.8436667  0.8100000  0.645
#>   2         150   0.8543333  0.8433333  0.690
#>   3          50   0.8560000  0.8100000  0.725
#>   3         100   0.8636667  0.8433333  0.700
#>   3         150   0.8826667  0.8266667  0.745
#> 
#> Tuning parameter 'nu' was held constant at a value of 0.1
#> ROC was used to select the optimal model using the largest value.
#> The final values used for the model were iter = 150, maxdepth = 3 and nu = 0.1.

^{Created on 2022-12-16 with reprex v2.0.2}

Continuing from that excerpt:

...That package's [ada's] interface is quite principled, so if you'd like to make use of that modeling engine with parsnip, you could build your own parsnip model with that engine without too many complications!

If you opt to set up a custom parsnip model configuration, the interface would feel something like:

library(tidymodels)

fit_resamples(
  boost_tree(engine = "ada"),
  formula,
  resamples = vfold_cv(data),
  metrics = metric_set(roc_auc)
)

Hope this is helpful for you.

system · January 6, 2023, 7:44pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.