Hi! I'm trying to fit an xgboost model (regression) for some Airbnb data. I´m using the tidymodels framework. I go thru my usual steps when working with tidymodels:
- Split data
data_split <- initial_split(listings_regre, strata = "y", prop = 0.8) data_train <- training(data_split) data_test <- testing(data_split)
- Create recipe
rec <- recipe(y ~ ., data = data_train) %>% step_nzv(all_nominal()) %>% step_dummy(all_nominal())
- Create model
xgb_mod <- boost_tree() %>% set_engine('xgboost') %>% set_mode('regression')
- Create workflow
xgb_flow <- workflow() %>% add_model(xgb_mod) %>% add_recipe(rec)
- Fit model
xgb_fit <- xgb_flow %>% last_fit(split = data_split)
Then I get:
preprocessor 1/1, model 1/1: Error in xgboost::xgb.DMatrix(x, label = y, missing = NA): 'data' has class 'character' and length 682192.\n 'data' accepts either a numeric matrix or a single filename."
But if change the workflow to
xgb_flow <- workflow() %>% add_model(xgb_mod) %>% add_formula(y ~ .)
Everything works just fine.
I understood from here that both of these should work but is not happening. Does anybody know what is wrong with my recipe? I prefer working with recipes so I'd prefer using the first option.
Thank you in advance