Plotting both the testing and training evaluation logs from an XGBoost last_fit

HanLum · July 29, 2024, 10:53am

I'm trying to make a plot to check for overfitting using the evaluation logs from model training and testing for my XGBoost model (made using tidymodels in r) but I can only plot the training evaluation log after performing last_fit (fit to training to test on testing set - equivalent to fit() then predict()). I've found that 'learning curves' can be calculated in python using model.evals_result() but I don't know what this corresponds to in r. I'm not even sure if there should be a testing evaluation log as last_fit only fits to training and evaluates on testing. Does anyone know if it's possible to retrieve both the training and testing evaluation log from a last_fit XGBoost model?

The only way I could plot both training and testing logloss was by also fitting to the test set only (fit() to testing combined with last_fit()). I'm not sure if this was the correct thing to do. Any thoughts or help would be very much appreciated!

abuislam · August 25, 2024, 7:22pm

To plot both training and testing evaluation logs for XGBoost in R with tidymodels:

Train Model: Use xgb.train() to get training logs.
Retrieve Logs: Extract logs using xgb_model$evaluation_log.
Plot: Use ggplot2 to plot training and testing metrics.

Example:

r

Copy code

library(xgboost)
library(ggplot2)

# Train model
xgb_model <- xgb.train(
  params = list(objective = "binary:logistic"),
  data = train_data,
  watchlist = list(train = train_data, test = test_data)
)

# Extract and plot logs
evals <- xgb_model$evaluation_log
ggplot(evals, aes(x = iter)) +
  geom_line(aes(y = train_logloss, color = 'Train')) +
  geom_line(aes(y = test_logloss, color = 'Test')) +
  labs(y = 'Log Loss', color = 'Data')

This plots both training and testing log loss to check for overfitting.

HanLum · August 28, 2024, 8:16pm

Thank you so much for your response @abuislam !!
That is exactly what I need but unfortunately I'm using tidymodels rather than the direct xgb package. Do you know how to extract both logs from a tidymodels fit?
I have used kfold cross validation for model training so I presume I need the assessment and validation results from there

system · November 26, 2024, 8:16pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.