tidymodels xgboost counts=TRUE

I am trying to run an xgboost model using tidymodel and get the error "Error: The option counts = TRUE was used but parameter colsample_bynode was given as 0.333. Please use a value >= 1 or use counts = FALSE". But, I didn't ever specify colsample_bynode. Here is how I specified my model:

boost_spec <- boost_tree(
  trees = 1000,             # number of trees, T in the equations above
  tree_depth = 2,          # max number of splits in the tree
  min_n = 5,               # min points required for node to be further split
  loss_reduction = 10^-5,  # when to stop - smaller = more since it only has to get a little bit better 
  sample_size = 1,         # proportion of training data to use
  mtry = 1/3,              # proportion of predictors used
  learn_rate = tune(),     # lambda from the equations above
  stop_iter = 50           # number of iterations w/o improvement b4 stopping
) %>% 
  set_engine("xgboost") %>% 
  set_mode("regression")

This code worked back in April 2021, but I see there's a note here about doing some updating of the mtry parameter. I'm confused about where I'm supposed to add the count=FALSE.

I guess I asked too soon. I think I solved it using this code, although I'm not exactly sure what I should use for mtry now or if it matters:

boost_spec <- boost_tree(
  trees = 1000,             # number of trees, T in the equations above
  tree_depth = 2,          # max number of splits in the tree
  min_n = 5,               # min points required for node to be further split
  loss_reduction = 10^-5,  # when to stop - smaller = more since it only has to get a little bit better 
  sample_size = 1,         # proportion of training data to use
  mtry = 30,              # proportion of predictors used - looks like this might now be the number of predictors used ... I'm investigating further
  learn_rate = tune(),     # lambda from the equations above
  stop_iter = 30           # number of iterations w/o improvement b4 stopping
) %>% 
  set_engine("xgboost", colsample_bytree = 1) %>% #colsample_bytree = proportion of predictors used, 1 = all
  set_mode("regression")

Solved by adding counts=FALSE to the set_engine argument.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.