Hi there! Thanks as ever for all the incredible work that's gone into creating the tidymodels framework, can't convey how useful it's been to my research!
My question is about using
xgboost - specifically how can I access the predictions/fit to the training data of the underlying model being trained (without using
To clarify what I mean, when fitting a Random Forest model, I can explore the fitted model (
rf_fit in the reprex below ) and its predictions on the training data in two ways.
predict(rf_fit, cells, type = "prob". (Method 1).
- Getting predictions from
rf_fit$fit$predictions) (Method 2).
These result in different predictions for reasons that have been clarified here.
In this case, I'm particularly interested in the equivalent of
rf_fit$fit$predictions (i.e. Method 2) for boosted regression trees and my
xgb_fit object. My questions are two-fold:
- Where in
xgb_fitare the predictions from the trained model? (I.e. where is the equivalent of
rf_fit$fit$predictionsthat we get for random forest models)? Or, what do I need to add to get those predictions outputted?
- If the above is possible, how should I interpret these predictions? Are they different from calling
predict? If so, what do they represent (I gather out-of-bag estimates are non-trivial for boosted regression trees)?
(Basically, I'd like the predictions from the model that produced the
training_logloss error at iteration 1000 of
# Load required libraries library(tidymodels); library(modeldata) #> Registered S3 method overwritten by 'tune': #> method from #> required_pkgs.model_spec parsnip # Set seed set.seed(123) # Load in data data(cells, package = "modeldata") # Define Random Forest Model rf_mod <- rand_forest(trees = 1000) %>% set_mode("classification") %>% set_engine("ranger") # Define BRT Model xgb_mod <- boost_tree(trees = 1000) %>% set_mode("classification") %>% set_engine("xgboost", objective = 'binary:logistic', eval_metric = 'logloss') # Fit the models to training data rf_fit <- rf_mod %>% fit(class ~ ., data = cells) xgb_fit <- xgb_mod %>% fit(class ~ ., data = cells) xgb_fit$fit$evaluation_log #> iter training_logloss #> 1: 1 0.542353 #> 2: 2 0.443275 #> 3: 3 0.382232 #> 4: 4 0.333377 #> 5: 5 0.303415 #> --- #> 996: 996 0.001918 #> 997: 997 0.001917 #> 998: 998 0.001917 #> 999: 999 0.001916 #> 1000: 1000 0.001915 # Examine output predictions on training data for RANDOM FOREST Model rf_whole <- predict(rf_fit, cells, type = "prob") # predictions based on whole fitted model rf_oob <- head(rf_fit$fit$predictions) # predictions based on out of bag samples ## these are different to each other as we would expect rf_whole$.pred_PS #>  0.9229111 rf_oob[1, "PS"] #> PS #> 0.8503902 # Examine output predictions on training data for BOOSTED REGRESSION TREE Model xgb_whole <- predict(xgb_fit, cells, type = "prob") reprex #> Error in eval(expr, envir, enclos): object 'reprex' not found
Created on 2021-10-05 by the reprex package (v2.0.1)