I currently have 4 supervised text classifiers, namely: Naive Bayes, Semi Naive Bayes, Bernoulli Naive Bayes, and SVM. As a performance metric, I am currently using the F1 score. I also wish to deduce the best model statistically due to the F1 score of all the classifiers being all within the same margin. According to online literature, I have found 3 types of tests:
- ANOVA test
- Friedman test
- Mcnemar`s test (only applies for comparing 2 classifiers)
Which test do you suggest that I utilize, or are there any better suited test for my task?
Much thanks