Warning while predicting with a Naïve Bayes model

I'm facing a curious issue when using a trained Naïve Bayes model to predict results on a test set. The reprex below illustrates the problem.

library(naivebayes)
#> naivebayes 0.9.7 loaded
library(e1071)

data <- data.frame(predictor = as.factor(rep(1:6, 4)),
                   label = c("F", "F", "F", "F", "F", "S", 
                             "S", "S", "S", "F", "F", "S",
                             "S", "F", "F", "F", "F", "S",
                             "F", "F", "F", "F", "F", "S"),
                   stringsAsFactors = TRUE)

train <- data[1:18, ]
test <- data[19:24, ]


nb <- naive_bayes(label ~ predictor, data = train, laplace = 1)

# The implementation found in the naivebayes package generates a warning during prediction.
predict(nb, newdata = test)
#> Warning: predict.naive_bayes(): more features in the newdata are provided as
#> there are probability tables in the object. Calculation is performed based on
#> features to be found in the tables.
#> [1] S F F F F S
#> Levels: F S


nb_e1071 <- naiveBayes(label ~ predictor, data = train, laplace = 1)

# No warnings are generated when using the e1071 implementation.
predict(nb_e1071, newdata = test)
#> [1] S F F F F S
#> Levels: F S

Created on 2020-05-29 by the reprex package (v0.3.0)

The predicted results of both models are the same. What I'd like to know is:

What does the warning generated by the naive_bayes() function mean? Can it be safely ignored? If not, what steps do I need to take to fix it?

predict(nb, newdata = select(test,-label))

you are telling the nb predict method to score with the columns of test, which include the label to be predicted, nb is simply warning that there are more columns being passed in than would be needed to predict label (only predictor is needed in this case)

Oh, I see. And I presume the e1071 variant doesn't check for this...hence no warning?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.