Here's an example script that fits a xgb model using evaluation metric prAUC (prSummary in trainControl with metric set to "AUC" in train)
library(caret)
library(tidyverse)
# example data
sampledata <- diamonds %>% mutate(target = ifelse(cut == "Premium", 1, 0) %>% make.names() %>% as.factor())
# fit a XGB model
train_control <- trainControl(
method = "cv",
number = 5,
classProbs = TRUE,
verboseIter = TRUE,
summaryFunction = prSummary,
savePredictions = TRUE,
allowParallel = TRUE
)
## tuning grid
tune_grid <- expand.grid(nrounds = 200,
max_depth = 5,
eta = 0.05,
gamma = 0.01,
colsample_bytree = 0.75,
min_child_weight = 0,
subsample = 0.5)
## xgb
xgb_model <- train(
x = select(sampledata, -c(cut, target, clarity, color)),
y = sampledata$target,
method = "xgbTree",
metric = "AUC", #actually prAUC sinc eusing prSummary
trControl = train_control,
tuneGrid = tune_grid,
tuneLength = 10)
After running this if I type xgb_model I see (pr)AUC of 0.978. My question is, is it possible to retrieve the thresholds that were used in this calculation?
A friend is running a similar model on python with scikit learn and we'd like to see if we can more closely compare our models. See https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html
From that link "This implementation is not interpolated and is different from computing the area under the precision-recall curve with the trapezoidal rule, which uses linear interpolation and can be too optimistic". If I was able to determine how many cut offs are being used by caret we could do a more like for like comparison of our models.
I glimpsed the model xgb_model but could not see anything called "threshold" though I'm sure I've come across the term within context of R in the past. Is there a way to get these threholds?