Is even an untuned gradient boosting model that good?

Hi. I am trying to understand how to use gradient boosting. Here is a small dataset and an admittedly poor-fitting logistic regression model, with auc on testing data equal zero. Then an untuned gradient boosting model produces an auc of .417. Now .417 on its own is pretty poor, but compared to zero, I think the improvement is huge. Does such a large improvement make sense? Is gradient boosting that good? Please let me know if my code is incorrect. Thank you.

library(ROCR)
library(mlr)
library(xgboost)

data("UCBAdmissions")
df <- data.frame(UCBAdmissions)
df$Admit <- as.factor(ifelse(df$Admit == "Admitted", 0, 1))
set.seed(1234)
samp.size = floor(0.80*nrow(df))
train_ind = sample(seq_len(nrow(df)), size = samp.size)
train = df[train_ind,]
test = df[-train_ind,]

1 poor fitting logistic regression

model <- glm(Admit ~ Gender + Dept + Freq, data = train, family = "binomial")
pred <- predict(model, test)
ROCRpred <- prediction(pred, test$Admit)
ROCRperf <- ROCR::performance(ROCRpred, "tpr", "fpr")
auc <- ROCR::performance(ROCRpred, measure="auc")
auc <- auc@y.values[[1]]
paste0("Logistic auc: ", round(auc,3))

df <- createDummyFeatures(df, target = "Admit",
cols = c("Gender","Dept"))
set.seed(1234)
samp.size = floor(0.80*nrow(df))
train_ind = sample(seq_len(nrow(df)), size = samp.size)
train = df[train_ind,]
test = df[-train_ind,]
trainTask <- makeClassifTask(data = train, target = "Admit", positive = 1)
testTask <- makeClassifTask(data = test, target = "Admit", positive = 1)
set.seed(1)
xgb_learner <- makeLearner(
"classif.xgboost",
predict.type = "prob",
par.vals = list(
objective = "binary:logistic",
eval_metric = "auc",
nrounds = 200
)
)

2 gradient boosting model

xgb_model <- train(xgb_learner, task = trainTask)
result <- predict(xgb_model, testTask)

head(result$data) # contains predictions

ROCRpred <- prediction(result$data[,3], result$data[,2])
ROCRperf <- ROCR::performance(ROCRpred, "tpr", "fpr")
auc <- ROCR::performance(ROCRpred, measure="auc")
auc <- auc@y.values[[1]]
paste0("XGB auc: ", round(auc,3))

[1] "Logistic auc: 0"

[1] "XGB auc: 0.417"

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.