One Class Classification in R language. What am I doing wrong when generating the confusion matrix?

I am trying to understand and implement classifiers A class in R is based on several UCIs and one of them (http://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease).

When trying to print a confusion matrix you are giving the error “all arguments must have the same length”.

What am I doing wrong?

library(caret)
library(dplyr)
library(e1071)
library(NLP)
library(tm)

ds = read.csv('kidney_disease.csv', 
              header = TRUE)

#Remover colunas inutiliz?veis              
ds <- subset(ds, select = -c(age), classification =='ckd' )

x <- subset(ds, select = -classification) #make x variables
y <- ds$classification #make y variable(dependent)

# test on the whole set
#pred <- predict(model, subset(ds, select=-classification))


trainPositive<-x
testnegative<-y

inTrain<-createDataPartition(1:nrow(trainPositive),p=0.6,list=FALSE)

trainpredictors<-trainPositive[inTrain,1:4]
trainLabels<-trainPositive[inTrain,6]

testPositive<-trainPositive[-inTrain,]
testPosNeg<-rbind(testPositive,testnegative)

testpredictors<-testPosNeg[,1:4]
testLabels<-testPosNeg[,6]

svm.model<-svm(trainpredictors,y=NULL,
               type='one-classification',
               nu=0.10,
               scale=TRUE,
               kernel="radial")

svm.predtrain<-predict(svm.model,trainpredictors)
svm.predtest<-predict(svm.model,testpredictors)

# confusionMatrixTable<-table(Predicted=svm.pred,Reference=testLabels)
# confusionMatrix(confusionMatrixTable,positive='TRUE')

confTrain <- table(Predicted=svm.predtrain,Reference=trainLabels)
confTest <- table(Predicted=svm.predtest,Reference=testLabels)

confusionMatrix(confTest,positive='TRUE')


print(confTrain)
print(confTest)

#grid

Here are some of the first lines of the dataset I'm using:

 id bp    sg al su    rbc       pc        pcc         ba bgr bu  sc sod pot hemo pcv   wc
1  0 80 1.020  1  0          normal notpresent notpresent 121 36 1.2  NA  NA 15.4  44 7800
2  1 50 1.020  4  0          normal notpresent notpresent  NA 18 0.8  NA  NA 11.3  38 6000
3  2 80 1.010  2  3 normal   normal notpresent notpresent 423 53 1.8  NA  NA  9.6  31 7500
4  3 70 1.005  4  0 normal abnormal    present notpresent 117 56 3.8 111 2.5 11.2  32 6700
5  4 80 1.010  2  0 normal   normal notpresent notpresent 106 26 1.4  NA  NA 11.6  35 7300
6  5 90 1.015  3  0                 notpresent notpresent  74 25 1.1 142 3.2 12.2  39 7800
   rc htn  dm cad appet  pe ane classification
1 5.2 yes yes  no  good  no  no            ckd
2      no  no  no  good  no  no            ckd
3      no yes  no  poor  no yes            ckd
4 3.9 yes  no  no  poor yes yes            ckd
5 4.6  no  no  no  good  no  no            ckd
6 4.4 yes yes  no  good yes  no            ckd

The error log:

> confTrain <- table (Predicted = svm.predtrain, Reference = trainLabels)
Table error (Predicted = svm.predtrain, Reference = trainLabels):
all arguments must be the same length
> confTest <- table (Predicted = svm.predtest, Reference = testLabels)
Table error (expected = svm.predtest, reference = testLabels):
all arguments must be the same length
>
> confusionMatrix (confTest, positive = 'TRUE')
ConfusionMatrix error (confTest, positive = "TRUE"):
'confTest' object not found
>
>
> print (confTrain)
Printing error (confTrain): object 'confTrain' not found
> print (confTest)
Printing error (confTest): object 'confTest' not found


What class of object is svm.predtrain? And what are its dimensions?

class(svm.predtrain)
dim(svm.predtrain)

This almost certainly has to do with missing data. That is, the svm.predtrain object you're getting from the predict function is of a shorter length (123 rows?) than the trainLabels object (152 rows?) because svm() is not outputting predictions for the rows where data is missing.

Maybe try na.exclude:

na.exclude differs from na.omit only in the class of the "na.action" attribute of the result, which is "exclude" . This gives different behaviour in functions making use of naresid and napredict : when na.exclude is used the residuals and predictions are padded to the correct length by inserting NA s for cases omitted by na.exclude .

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.