Hi there! Thanks as ever for all the incredible work that's gone into creating the tidymodels framework, can't convey how useful it's been to my research!

My question is about using `xgboost`

- specifically how can I access the predictions/fit to the training data of the underlying model being trained (*without* using `predict`

).

To clarify what I mean, when fitting a Random Forest model, I can explore the fitted model (`rf_fit `

in the reprex below ) and its predictions on the training data in two ways.

- Using
`predict()`

- calling`predict(rf_fit, cells, type = "prob"`

. (Method 1). - Getting predictions from
`rf_fit`

directly (`rf_fit$fit$predictions`

) (Method 2).

These result in different predictions for reasons that have been clarified here.

In this case, I'm particularly interested in the equivalent of `rf_fit$fit$predictions`

(i.e. Method 2) for boosted regression trees and my `xgb_fit`

object. My questions are two-fold:

- Where in
`xgb_fit`

are the predictions from the trained model? (I.e. where is the equivalent of`rf_fit$fit$predictions`

that we get for random forest models)? Or, what do I need to add to get those predictions outputted? - If the above is possible, how should I interpret these predictions? Are they different from calling
`predict`

? If so, what do they represent (I gather out-of-bag estimates are non-trivial for boosted regression trees)?

(Basically, I'd like the predictions from the model that produced the `training_logloss`

error at iteration 1000 of `xgb_fit$fit$evaluation_log`

).

```
# Load required libraries
library(tidymodels); library(modeldata)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
# Set seed
set.seed(123)
# Load in data
data(cells, package = "modeldata")
# Define Random Forest Model
rf_mod <- rand_forest(trees = 1000) %>%
set_mode("classification") %>%
set_engine("ranger")
# Define BRT Model
xgb_mod <- boost_tree(trees = 1000) %>%
set_mode("classification") %>%
set_engine("xgboost",
objective = 'binary:logistic',
eval_metric = 'logloss')
# Fit the models to training data
rf_fit <- rf_mod %>%
fit(class ~ ., data = cells)
xgb_fit <- xgb_mod %>%
fit(class ~ ., data = cells)
xgb_fit$fit$evaluation_log
#> iter training_logloss
#> 1: 1 0.542353
#> 2: 2 0.443275
#> 3: 3 0.382232
#> 4: 4 0.333377
#> 5: 5 0.303415
#> ---
#> 996: 996 0.001918
#> 997: 997 0.001917
#> 998: 998 0.001917
#> 999: 999 0.001916
#> 1000: 1000 0.001915
# Examine output predictions on training data for RANDOM FOREST Model
rf_whole <- predict(rf_fit, cells, type = "prob") # predictions based on whole fitted model
rf_oob <- head(rf_fit$fit$predictions) # predictions based on out of bag samples
## these are different to each other as we would expect
rf_whole$.pred_PS[1]
#> [1] 0.9229111
rf_oob[1, "PS"]
#> PS
#> 0.8503902
# Examine output predictions on training data for BOOSTED REGRESSION TREE Model
xgb_whole <- predict(xgb_fit, cells, type = "prob")
reprex
#> Error in eval(expr, envir, enclos): object 'reprex' not found
```

^{Created on 2021-10-05 by the reprex package (v2.0.1)}