Statistically comparing supervised text classifiers

I currently have 4 supervised text classifiers, namely: Naive Bayes, Semi Naive Bayes, Bernoulli Naive Bayes, and SVM. As a performance metric, I am currently using the F1 score. I also wish to deduce the best model statistically due to the F1 score of all the classifiers being all within the same margin. According to online literature, I have found 3 types of tests:

  1. ANOVA test
  2. Friedman test
  3. Mcnemar`s test (only applies for comparing 2 classifiers)

Which test do you suggest that I utilize, or are there any better suited test for my task?

Much thanks

Mcnemar`s test isn't great since it only uses the off-diagonal entries in the confusion matrix and tests to see if those are about equal.

The tests listed are typically used on the test set. You should only use that data at the end after you've selected one or two models to keep.

You might want to test the resampled performance metrics using tidyposterior. It fits a Bayesian model to the resampling statistics and you can use that to get probabilities for superiority or practical equivalence. Take a look at Section 11.4 of Tidy Models with R.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.