prediction function error

Hello,
I am new to machine learning. I built a random forest model to train and test my data as shown below

rf_classifier <- randomForest(x= training_set[,-20],
                        y = as.factor(training_set$Class.ASD),
                           ntree= 10)

I then used to predict the test set as below,

y_pred_rf <- predict(rf_classifier, newdata = test_set[,-20])
head(y_pred_rf)
>
  1   2   4   7   9  12 
 NO  NO  NO YES YES  NO 
Levels: NO YES
>

I want to build a Recall-Precision curve, I did this:

rf_prediction<- prediction(y_pred_rf, test_set$Class.ASD)* 

RP.perf <- performance(rf.train, "prec", "rec")
The previous gives a result;
Error: Format of predictions is invalid. It couldn't be coerced to a list.

I then used the following to check the type of structure the y_pred_rf is

sapply(c(is.vector, is.matrix, is.list, is.data.frame), do.call, list(y_pred_rf))

It shows

[1] FALSE FALSE FALSE FALSE
sapply(c(is.vector, is.matrix, is.list, is.data.frame), do.call, list(test_set$Class.ASD))

While this shows

[1]  TRUE FALSE FALSE FALSE

How do I correct this in order to build the curve?

you use prediction() to make rf.prediction... but then you dont use that, you rather switch to rf.train to send to performance() but was rf.train a prediction object that you made earlier, or did you intend rf.prediction instread of rf.train ?

Sorry, I posted the wrong code

rf_prediction<- prediction(y_pred_rf, test_set$Class.ASD)

The code above is generating the following error

Error: Format of predictions is invalid. It couldn't be coerced to a list.

The code below is what i used for the performance, it won't even work because "rf_prediction" is not created

RF_Pre_Rec <- performance(rf_prediction, measure = "prec", x.measure = "rec")

I would expect that your issue is the labels are characters.
i.e.

(somepreds <- factor(c("yes","yes","yes","no","no"),levels = c("yes","no")) )
(somelabels <- c("yes","no","yes","no","no"))
library(ROCR)
prediction(somepreds,somelabels)
#Error: Format of predictions is invalid. It couldn't be coerced to a list.

I have converted it into a factor, it is still the same error. The main problem is
"y_pred_rf" is neither a vector, matrix, list, or dataframe.

How do one use the predict function to create a vector of prediction as below?
y_pred_rf <- predict(rf_classifier, newdata = test_set[,-20])

to find out what y_pred_rf is do

class(y_pred_rf)
str(y_pred_rf)

I find this to be a vague statement, can you be explicit ?

This is what it shows

class(y_pred_rf)
[1] "factor"

str(y_pred_rf)
Factor w/ 2 levels "NO","YES": 1 1 1 2 2 1 2 1 1 2 ...

  • attr(*, "names")= chr [1:87] "1" "2" "4" "7" ...

ok, so the predictions are factors , that seems fine to me.
and the labels, what are they please ?

I'm not sure if I get what you mean, I assume you are referring to attr(*, "names"). Well, I think it's the row name/numbers of the data selected into the test set as against the training set. I am not sure why "y_pred_rf" is showing this and not just a vector of prediction. This is the way the 'predict' was applied.
y_pred_rf <- predict(rf_classifier, newdata = test_set)

sorry, that isnt what I mean.
look at the prediction function, and see how it is intended to work.

?prediction

Usage
prediction(predictions, labels, label.ordering = NULL)
Arguments
predictions
A vector, matrix, list, or data frame containing the predictions.

labels
A vector, matrix, list, or data frame containing the true class labels. Must have the same dimensions as predictions.

This is consistent with the example I provided, where I guessed at your problem.
However I was a little off, we do infact need a numeric representation of your predictions so for example a working adjustment to my example would be.

(somepreds <- factor(c("yes","yes","yes","no","no"),levels = c("yes","no")) )
(somelabels <- factor(c("yes","no","yes","no","no")))
library(ROCR)
(preds_num <- as.numeric(somepreds))

(mypred <- prediction(preds_num,somelabels))

(myperf <- performance(mypred,"prec", "rec"))

plot(myperf)

in this sompreds is analgous to y_pred_rf
and somelabels is standing in for your test_set$Class.ASD

I get that. Are you saying I should convert "y_pred_rf" to number factors? Also "y_pred_rf" was not created as a vector; Is there a way the predictions can be created as a vector? What I mean is; why is the code below not resulting in a vector of predictions?
y_pred_rf <- predict(rf_classifier, newdata = test_set)

is.vector(y_pred_rf)
[1] FALSE

ok, I think there is confusion because factors aren't technically 'vectors' even though for all intents and purposes they act like them.

is.vector(factor(c(1,2)))

im saying yes, convert y_pred_rf to its numeric representation (factors are integers under the hood)
y_pred_rf was created as a factor variable, you have shown me as much.

Thanks, I get it. This is what I did
preds_num_rf <- as.numeric(y_pred_rf)
rf_prediction<- prediction(preds_num_rf, test_set$Class.ASD) # Create a prediction object
RF_Pre_Rec <- performance(rf_prediction, measure = "prec", x.measure = "rec")
plot (RF_Pre_Rec)

Note:
"rf_prediction" was created so was "RF_Pre_Rec"

plot (RF_Pre_Rec) generated the following graph. Please does this graph make any sense?

do a cross table of prediction against label.

table(reds_num_rf, test_set$Class.ASD)

I expect you will see very small if no numbers in one quandrant, the True Positives.
?

Thanks; I think I get why now. I did it as shown below.

table(preds_num_rf, test_set$Class.ASD)

preds_num_rf NO YES
1 45 0
2 0 42

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.