I ask this question in the hope of being steered in the right direction.
I am running several M.L. models (RF, SVM, etc.) using caret. I would like to generate a confusion matrix which will allow me (or facilitate me) to investigate the actual observations that have been mis-classified. Is this possible?
For example, my dataset contains protein sequences as my observations. What I would like to find is a list of the proteins that have been mis-classified. My working hypothesis is quite simple. Do proteins of length less than X (eg < 25 polypeptides in length) have a higher rate of being miss-classified?