serious problem with classification

Hello. I am new to R studio and currently, I'm working on my MSc thesis and to begin the whole process, I should classify a crash data set with different supervised classification methods such as neural net,SVM,knn,rpart or random forest. My biggest problem is even I normalize the data and tune the parameters and split it into train and test dataset and then load it into the script and run the classification, it gives me a very low kappa and a bad confusion matrix!
The data also has equal number of records for each class. I really need help to figure this out.
Thank You.

Hi @msabam,

What do you need to figure out? Do you think there is a bug in your code that is causing the poor model performance? Or do you just wish the model performed better?

Depending on your training process, you may be severely overfitting to the training set which would likely lead to poor performance in a validation hold-out.

1 Like

No, I think there is a bug that causes the poor model performance and I cannot get what it is. With sampling, I take 70% of the whole data for training and the other 30% for testing.I can upload the code here. I wish I could upload the data so that you can see it. Maybe the data is the problem.

Have you split the 70% into a training and testing set? Use that model against validation 30%. Will post tutorial link when I get home

1 Like

See the HarvardX Data Science Textbook Chapter 27 for a walk-through

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.