guide on evaluating classification models

I've been planning to write a short essay (not homework, just for self-learning) on various metrics used to assess the performance of "classification" models (including models like logistic regression which is actually probablistic prediction but often used in the classification context). The goal is to introduce the basic idea behind each metric, pros and cons, and maybe finish with a case study.

I would like to start with common measures such as the confusion matrix, accuracy, sensitivity, specificity, auc, and explain people's criticism around them. And then proceed to other measures like log likelihood, R^2, brier score, etc.

As I read more articles (mainly originated from Dr Harrel's blog), I learned the concept of scoring rule which seems to have important applications in this. Yet I find most articles explaining them in the context of decision science instead of machine learning, which is beyond me. And I feel losing track of the big picture because there are just too many metrics. I figured I may need to know

  • the history / timeline of important metrics: what's and context and what problem does it solve; what metric may be the supplement of another

  • big categories of metrics: for example the c-index seems to be based on the log likelihood, perhaps they can be both labelled as something like "likelihood based"?

  • known advantages and limitations of certain ways of metric formulation

  • a more detailed description of scoring rules in the context of classification models instead of decision theory

If anyone happens to know any article, book, blog post, literature review, paper that gives a comprehensive explanation of some of these topics or any topic you find relevant to this issue, I will be very pleased to know!

Thanks.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.