Back again, but this time with a problem in my prediction following a randomforest analysis.
I know I'm close to the solution, but I can't find anything to fix the problem.
Explanations:
I'm dealing with parcel data (so with lots of NAs everywhere, because otherwise it would be too simple). I have initiated a randomforest which should help me to predict if my specimens are type A or B. Everything is going well on the randomforest side, it's going well, no worries. The bottom hurts as soon as I try to run the predictions on a new data sample (called species_to_predict). I feel like the script is locking on NA rather than making the "no worries" prediction.
I don't know if I'm being very clear, but here's the excerpt from the code:
> species.rf <- randomForest(species.imputed[,1:42], species$hyo_ortho)
> predicted = predict(species.rf, newdata = species_to_predict)
For the randomForest, the selection is only made on 42 of the 43 columns, the last one being my famous A or B (hyo_ortho) types, in order to respect the dimensions.
And so, if I run the script, without surprise I get :
Error in predict.randomForest(species.rf, newdata = species_to_predict, :
missing values in newdata
How do you get the script not to read NA and the prediction to "grace" the rest?
Thank you in advance.