Error in xgboost: Feature names stored in `object` and `newdata` are different!

Hi everybody!
I wrote a script using xgboost to predict a new class. With iris it works like this:

library(xgboost)
library(tidyverse)
library(caret)
library(readxl)
library(caret)
library(data.table)
library(mlr)

data <- iris
righe_train <- sample(nrow(data),nrow(data)*0.8)
train <- data[righe_train,]
test <- data[-righe_train,]

setDT(train) 
setDT(test)

labels <- train$Species
ts_label <- test$Species
new_tr <- model.matrix(~.+0,data = train[,-c("Species"),with=F]) 
new_ts <- model.matrix(~.+0,data = test[,-c("Species"),with=F])

#convert factor to numeric 
labels <- as.numeric(labels)-1
ts_label <- as.numeric(ts_label)-1
class(new_tr)

#preparing matrix 
dtrain <- xgb.DMatrix(data = new_tr,label = labels) 
dtest <- xgb.DMatrix(data = new_ts,label=ts_label)

#default parameters
params <- list(booster = "gbtree",
                 objective = "multi:softmax",
                 num_class = 3,
                 eta=0.3,
                 gamma=0,
                 max_depth=6,
                 min_child_weight=1,
                 subsample=1,
                 colsample_bytree=1)

xgbcv <- xgb.cv( params = params,
                 data = dtrain,
                 nrounds = 100,
                 nfold = 5,
                 showsd = T,
                 stratified = T,
                 print_every_n = 10,
                 early_stopping_round = 20,
                 maximize = F)
##best iteration 

min(xgbcv$test.error.mean)


#first default - model training
xgb1 <- xgb.train (params = params,
                   data = dtrain, 
                   nrounds = 21,
                   watchlist = list(val=dtest,train=dtrain),
                   print.every.n = 10,
                   early.stop.round = 10,
                   maximize = F ,
                   merror = "error")
                  # eval_metric = "error")
#model prediction
xgbpred <- predict (xgb1,dtest)
xgbpred <- ifelse (xgbpred > 0.5,1,0)

#confusion matrix
factors_both <- as.factor(c(xgbpred, ts_label))
xgbpred_f <- factors_both[1:length(xgbpred)]
ts_label_f <- factors_both[length(xgbpred)+1:length(xgbpred)*2]

confusionMatrix (xgbpred_f,ts_label_f)

#new record
(new_record_raw <- c(5.3,3.2,2.0,0.2))
(new_record_mat <- matrix(new_record_raw,nrow = 1))
(new_record_dmat <- xgb.DMatrix(data = new_record_mat))
predict(xgb1,newdata=new_record_dmat)

but when I run the part > #new record using my dataset, I have this error:

Error in predict.xgb.Booster(xgb1, newdata = xgb.DMatrix(data = as.matrix(test))) : 
  Feature names stored in `object` and `newdata` are different!

Why I have this error? Where could I have gone wrong? can anyone suggest me some new ideas?

you havent created a matrix with the sane feature names that the model has been trained to use.



colnames(dtrain)
colnames(dtest)

#new record
(new_record_raw <- c(5.3,3.2,2.0,0.2))
(new_record_mat <- matrix(new_record_raw,nrow = 1))
(new_record_dmat <- xgb.DMatrix(data = new_record_mat))
#see that there is nothing ...
colnames(new_record_dmat))
#if you supplied the values in the correct order to line up with the old feature names, you can copy the names over
colnames(new_record_dmat) <- colnames(dtrain)
colnames(new_record_dmat)

Ok, but now I have this error:

Error in dimnames(x) <- dn : 
  length of 'dimnames' [2] not equal to array extent

I guess you arent providing the correct number of fields.

I don't think so, because in the train I have 20 features plus the one to forecast on. In the test I only have the 20 characteristics

Why not get the dimensions of the objects on both sides of your assignment ? Then you will know how many of whatever you have.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.