Tidymodels XGBoost error

I was following instruction for tuning XGBoost with tidymodels by Julia Silge and I'm trying to adapt my NBA DFS data and I keep running into the "All models failed" error. Can you help me figure out the issue? I've posted my code below.

Jan29 <- read.csv("nbaJan29a.csv", header = TRUE)
> Jan29$Position <- as.numeric(Jan29$Position)
> Jan29$First.Name <- as.numeric(Jan29$First.Name)
> Jan29$Nickname <- as.numeric(Jan29$Nickname)
> Jan29$Last.Name <- as.numeric(Jan29$Last.Name)
> Jan29$Played <- as.numeric(Jan29$Played)
> Jan29$Salary <- as.numeric(Jan29$Salary)
> Jan29$RFMin <- as.numeric(Jan29$RFMin)
> Jan29$Team <- as.numeric(Jan29$Team)
> Jan29$Opponent <- as.numeric(Jan29$Opponent)
>
> View(Jan29)
>
> library(tidymodels)
> set.seed(123)
>
> nba_split <- initial_split(Jan29, strata = Today.s.Score)
> nba_train <- training(nba_split)
> nba_test <- testing(nba_split)
>
> xgb_spec <- boost_tree(trees = 1000,tree_depth = tune(),min_n = tune(),loss_reduction = tune(),sample_size = tune(),mtry = tune(),learn_rate = tune(),) %>% set_engine("xgboost")%>%set_mode("regression")
>
>
> xgb_grid <- grid_latin_hypercube(tree_depth(),min_n(),loss_reduction(),sample_size = sample_prop(),finalize(mtry(),nba_train),learn_rate(),size = 20)
>
> xgb_wf <- workflow() %>% add_formula(Today.s.Score ~ .)%>% add_model(xgb_spec)
>
>
> set.seed(123)
> nba_folds <- vfold_cv(nba_train, strata = Today.s.Score)
> nba_folds
# 10-fold cross-validation using stratification
# A tibble: 10 x 2
splits id
<list> <chr>
1 <split [1.3k="" 144]=""> Fold01
2 <split [1.3k="" 144]=""> Fold02
3 <split [1.3k="" 144]=""> Fold03
4 <split [1.3k="" 142]=""> Fold04
5 <split [1.3k="" 142]=""> Fold05
6 <split [1.3k="" 141]=""> Fold06
7 <split [1.3k="" 141]=""> Fold07
8 <split [1.3k="" 140]=""> Fold08
9 <split [1.3k="" 140]=""> Fold09
10 <split [1.3k="" 140]=""> Fold10
>
> doParallel::registerDoParallel()
> set.seed(234)
> xbg_res <- tune_grid(xgb_wf,resamples = nba_folds,grid = xgb_grid,control = control_grid(save_pred = TRUE))
Warning message:
All models failed. See the `.notes` column.

It's very hard to tell without a reproducible example. I do have some suggestions though:

  • Print out xbg_res. It lists the errors.

  • Generally, it is better to give us an example without parallel processing on

  • The initial as.numeric() work seems problematic. In R, we generally want factor vectors for qualitative variables. As integers, you are probably not presenting the data to the model in an appropriate way. For example, the second last name is being quantified as twice the value of the first last name.

  • I suggest changing those as.numeric() calls to as.factor() then using a recipe with step_dummy() to convert them to binary indicators. xgboost needs all numeric features and this is probably the best approach to doing so.

Hey Max, thank you for the response! I changed those factors to as.numeric to see if it would help with XGBoost. I get the same error even when I don't transform the data. Here's the error I get:

print(xbg_res)
# Tuning results
# 10-fold cross-validation using stratification 
# A tibble: 10 x 5
   splits             id     .metrics .notes            .predictions
   <list>             <chr>  <list>   <list>            <list>      
 1 <split [1.4K/156]> Fold01 <NULL>   <tibble [20 x 1]> <NULL>      
 2 <split [1.4K/156]> Fold02 <NULL>   <tibble [20 x 1]> <NULL>      
 3 <split [1.4K/156]> Fold03 <NULL>   <tibble [20 x 1]> <NULL>      
 4 <split [1.4K/156]> Fold04 <NULL>   <tibble [20 x 1]> <NULL>      
 5 <split [1.4K/155]> Fold05 <NULL>   <tibble [20 x 1]> <NULL>      
 6 <split [1.4K/155]> Fold06 <NULL>   <tibble [20 x 1]> <NULL>      
 7 <split [1.4K/155]> Fold07 <NULL>   <tibble [20 x 1]> <NULL>      
 8 <split [1.4K/153]> Fold08 <NULL>   <tibble [20 x 1]> <NULL>      
 9 <split [1.4K/153]> Fold09 <NULL>   <tibble [20 x 1]> <NULL>      
10 <split [1.4K/153]> Fold10 <NULL>   <tibble [20 x 1]> <NULL>      
Warning message:
This tuning result has notes. Example notes on model fitting include:
preprocessor 1/1, model 15/20: Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj): [14:23:35] amalgamation/../src/objective/objective.cc:23: Unknown objective function reg:squarederror
preprocessor 1/1, model 18/20: Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj): [14:23:36] amalgamation/../src/objective/objective.cc:23: Unknown objective function reg:squarederror
preprocessor 1/1, model 4/20: Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj): [14:23:38] amalgamation/../src/objective/objective.cc:23: Unknown objective function reg:squarederror

I think that we need a small, reproducible example to solve this one.

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page.

Can you run the following two lines of code and post the results here, please:

require(xgboost)
sessionInfo()

reg:squarederror wasn't a valid xgboost objective until reg:linear was deprecated in version 1.0.0 If your version of xgboost is < 1.0.0 You should update xgboost package and try again. tidymodels might be feeding a "default" value that doesn't exist in your version of xgboost.

Also, this is neither here nor there, but your use of as.numeric() on factors is kind of a necessary evil. XGBoost doesn't work with factors and depending on the number of levels in the factor, one-hot encoding could make your data frame WAY too large. Boosted tree algorithms are really pretty clever and if you just do as.numeric() like you did, your variables can still be super helpful in splitting trees. I believe the documentation for LightGBM actually recommends that you do this for categorical variables with many options. Personally, I've experienced little to no drop off in accuracy with large decreases in processing time.

You were right! I kept running into an error trying to update xgboost. There's apparently an issue with Windows computers. I found this on stackoverflow that let me update xgboost and then it worked. Thanks!

dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR))

  • dir.create(dotR)
    

M <- file.path(dotR, "Makevars.win")
if (!file.exists(M))

  • file.create(M)
    

[1] TRUE
cat("\nCXX14FLAGS=-O3 -Wno-unused-variable -Wno-unused-function",

  • "CXX14 = $(BINPREF)g++ -m$(WIN) -std=c++1y",
    
  • "CXX11FLAGS=-O3 -Wno-unused-variable -Wno-unused-function",
    
  • file = M, sep = "\n", append = TRUE)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.