Comparing Supervised Learning Algorithms


#1

Hello, I am comparing 2 supervised learning classification algorithms on some training data, I have the following table yielding the 10-fold cross validation accuracies for the 2 Algorithms. What I am interested in knowing, is at what confidence interval can one assume that Algorithm 1 outperforms Algorithm 2?
It would be very much appreciated if someone could help me with this problem
Kind regards
Ronnie!

CV Fold        Algorithm 1          Algorithm 2
1                91.11                 90.7
2                90.48                 90.52
3                91.87                 90.88
4                90.52                 90.87
5                89.88                 90.02
6                89.77                 88.99
7                91.44                 90.98
8                90.88                 91.44
9                90.77                 90.77
10               90.89                 90.92


#2

This problem is what caret::resamples and (even better) tidyposterior are focused on.

You might want to stay away from confidence intervals and go Bayesian (which is not difficult here). This paper (pdf) does a good job explaining why you would want to do that, although I think that their statistical model is to prescriptive. tidyposterior makes it easy to make real probability statements about the differences in models. Using confidence intervals, you can't do that because of how they are created.

Here's an example with your data. You can run

library(dplyr)
library(tidyposterior)

cv_result <- tribble(
  ~id,         ~Algorithm_1,          ~Algorithm_2,
  "1",                91.11,                 90.7,
  "2",                90.48,                 90.52,
  "3",                91.87,                 90.88,
  "4",                90.52,                 90.87,
  "5",                89.88,                 90.02,
  "6",                89.77,                 88.99,
  "7",                91.44,                 90.98,
  "8",                90.88,                 91.44,
  "9",                90.77,                 90.77,
  "10",               90.89,                 90.92
)

bayes_model <- perf_mod(cv_result, seed = 3806)

to get a model to compares the two algorithms and then use the summary methods to get the probability statements:

> # Credible intervals for each model
> bayes_model %>% tidy() %>% summary()
# A tibble: 2 x 4
  model        mean lower upper
  <chr>       <dbl> <dbl> <dbl>
1 Algorithm_1  12.7  1.66  24.4
2 Algorithm_2  12.6  1.81  24.3
> 
> # Results on the difference in performance
> contrast_models(bayes_model, seed = 3451) %>% summary()
# A tibble: 1 x 9
  contrast                   probability   mean lower upper  size pract_neg pract_equiv pract_pos
  <chr>                            <dbl>  <dbl> <dbl> <dbl> <dbl>     <dbl>       <dbl>     <dbl>
1 Algorithm_1 vs Algorithm_2       0.528 0.0633 -1.80  1.90    0.        NA          NA        NA
> # The prob that model 1 is better than model 2 is 52.8% (assuming that lower is better)

The ROPE estimates described in the paper can be used via the size argument to the summary method on contrast_models (see the website for examples).