AUC/ROC Plot in EVtree Package

Hi All,

I've been working with an 'evolutionary tree' package known as 'evtree,' but unfortunately I've ran into some obstacles in terms of accuracy-type measures for the package.

I wondered if anyone would be able to help provide some code in terms of obtaining an AUC for classification-type data, and MSE for regression-type analysis.

The authors have provided some example code:

X1 <- rep(seq(0.25, 1.75, 0.5), each = 4)
X2 <- rep(seq(0.25, 1.75, 0.5), 4)
Y <- rep(1, 16)
Y[(X1 < 1 & X2 < 1) | (X1 > 1 & X2 > 1)] <- 2
Y <- factor(Y, labels = c("O", "X"))
chess22 <- data.frame(Y, X1, X2)

## trees
library("evtree")
set.seed(1090)
evtree(Y ~ ., data = chess22, minbucket = 1, minsplit = 2)

Would be appreciated if someone could help with some code in terms of generating accuracy-type measures like AUC/MSE following the evtree function.

Thanks

Have you looked at package yardstick? It has ROC along with a bunch of other metrics. It's part of the tidymodels meta-package.

You'd need to save your model into something, like:

my_model <- evtree(Y ~ ., data = chess22, minbucket = 1, minsplit = 2)

And then predict on a dataset where you have a known outcome. Usually the prediction is done with the generic function predict:

predicted <- predict(my_model, some_known_outcome_data)

That way you have your true outcome, and your prediction outcome ready for the ROC in the yardstick package.

install.packages('yardstick')
library(yardstick)

head(two_class_example)
metrics(two_class_example, truth, predicted)
roc_auc(two_class_example, truth, Class1)

Example stolen from here:

Thanks @Hayward. Really appreciate your feedback

Struggling a little with some of the arguments, would love your thoughts

For example, what would be the difference between 'two_class_example' and 'the truth'? I thought these were both the dataset?

And in-keeping with the example, why would Class1 (in a binary model) be an argument in the roc_auc function?

The two_class_example is a prediction dataset that has 'the truth' attached to the first column. The other three columns are predictions: i.e. the model predicted likelihood the result is 'class 1'; model predicted likelihood the result is 'class 2'; and the final binary guess 'predicted'.

(It's a bit different from chess22, because chess22 is 'the truth' and two independent, explanatory variables.) If you don't have likelihood predictions from your model forecast, you could estimate the ROC by simply turning your binary prediction into 1 or 0 rather than a likelihood, but a binary guess is less informative since it doesn't include information about how unsure you are.


library(dplyr)
my_model <- evtree(Y ~ ., data = chess22, minbucket = 1, minsplit = 2)
predicted = predict(my_model, chess22)
outcome <- bind_cols(truth = chess22[['Y']], predicted = predicted) %>% 
                               mutate(class1 = as.numeric(predicted)-1, class2 = 1-class1)
roc_auc(outcome, truth, class2)

Since the tree model based on the chess22 simulated data makes a perfect binary model, the AUC from the prediction--sans likelihoods--is also perfect, the auc is 1 and really not all that interesting.

  .metric .estimator .estimate
  <chr>   <chr>          <dbl>
1 roc_auc binary             1

Thank you @Hayward . I appreciate your help

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.