I'm trying to run a random forest and then use the predict function to assest the accuracy of the model.
I have a train database with 7397 rows x 13 features
And a validation database with 2468 rows x 13 features
I first run the random forest function on the train database without any problem but then when I try to predict and assest the accuracy on the validation database I get the error:
Error in model.frame.default(Terms, newdata, na.action = na.omit) :
variables lenght differ (found for 'Administrative')
In addition: Warning message:
'newdata' has 2468 rows but the variable found has 7397 rows
So I used a subset from the train db which is a sample with 2468 (the same lenght of the validation db) but I still got the same error.
maybe because the validation database comes from a separate file (it has been given to me by my professor) and not from the same train database ? They both come from the same data base that the professor split in 3: train, validation and test.
Train data: about 60% of the units of the original dataset
validation data: about 20% of the units of the original dataset
test data: about 20% of the units of the original dataset
I've found that a col name in the validation database was different from the training one, fixed it but still having the same error. Now the variables length differs is found for "Month" but I can't really understand what is going on. This thing is driving me crazy, i've been trying to fixt it for hours.