Error in model.frame.default(formula = digitInfo$label ~ ., data = DigTrain, : variable lengths differ (found for 'X1x1')

I downloaded the csv from Kaggle, first I used Naive Bayes, and then next tried to use decision trees to compare the two methods, but am having trouble with rpart. I've included all the code to the point of where I get the error in the event that helps. I am new to coding so I'm not sure if that helps or not. I've tried searching for

#Check the structure of the dataframe, the label is integer, should be factor
#so all the same labels get grouped together later on.
str(digitInfo)
#Changing label to type factor
digitInfo$label<-as.factor(digitInfo$label)
str(digitInfo)

#Create training data and testing data, Then load the e1071 package.
Sample<- as.integer(nrow(digitInfo)/3)
Sample1<-sample(nrow(digitInfo),Sample)

(DigTest<-digitInfo[Sample1,])
(DigTrain<-digitInfo[-Sample1,])

Make sure label is factor type

str(DigTest)

Copy the Labels

(TestLabels <- DigTest[,1])
str(DigTest)

Remove the labels

(DigTestNOLabel <- DigTest[,-c(1)])

library(e1071)
#The average number of times 28x21 was used by the numbers 7 & 9
#were 0.45 & 0.08 respectively & their standard deviations were 7.43 for 7 and 2.03 for 9.
(NBe1071<-naiveBayes(DigTrain, DigTrain$label, laplace = 1))
NBe1071Pred <- predict(NBe1071, DigTestNOLabel)

NB_e1071

table(NBe1071Pred,TestLabels)
(NBe1071Pred)

Visualize

plot(NBe1071Pred)

#Decision Tree
#Got an error when trying to run rpart, the error was variable lengths. When I searched the error, the results
#say to make sure there are no NAs in your data, then check that the data type, Everything looks okay.

sum(is.na(digitInfo))
sum(is.na(DigTrain))
str(DigTrain)
str(digitInfo)

fitTrain<- rpart(digitInfo$label ~ . , data = DigTrain, method="class")

Error in model.frame.default(formula = digitInfo$label ~ ., data = DigTrain, :
variable lengths differ (found for 'X1x1')

DigTrain is a subset of digitInfo

(DigTrain<-digitInfo[-Sample1,])

You cannot fit the label column of digitInfo against the data of DigTrain, as you try to do with

fitTrain<- rpart(digitInfo$label ~ . , data = DigTrain, method="class")

This should work

fitTrain<- rpart(label ~ . , data = DigTrain, method="class")

Thank you, that did work! I (obviously) didn't know that, I will certainly remember that, you've saved me from even more hours of searching for answers!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.