Hello you all!
I'm trying to run a random forest and then use the predict function to assest the accuracy of the model.
I have a train database with 7397 rows x 13 features
And a validation database with 2468 rows x 13 features
I first run the random forest function on the train database without any problem but then when I try to predict and assest the accuracy on the validation database I get the error:
Error in model.frame.default(Terms, newdata, na.action = na.omit) :
variables lenght differ (found for 'Administrative')
In addition: Warning message:
'newdata' has 2468 rows but the variable found has 7397 rows
So I used a subset from the train db which is a sample with 2468 (the same lenght of the validation db) but I still got the same error.
n_v<-2468
train_2 = sample(1:nrow(online_shoppers_intention_train), n_v)
OSI.ran.forest.3 <- randomForest(Revenue~., data= online_shoppers_intention_train, subset=train_2, mtry=12,importance=TRUE)
yhat.OSI = predict(OSI.ran.forest.3, newdata=validation_db)
The two databases have NOT any missing values, I have already checked.