XGB.Booster models extracted from Tidymodels no longer give correct predictions or shapviz results

kescox · February 4, 2026, 4:34pm

I have a tidymodels workflow I built back in 2024 that I previously used to do the following steps:

Fit a set of XGBoost models to my training data.
Get predictions for my full set of training + test data.
Extract the fit models for later use with shapviz to visualize variable importance.

I was following this workflow -- which is recommended by the creator of shapviz -- to then produce and visualize my SHAP values. Note that it uses extracted fit models to do so, after using the workflow to bake a sample data set into a matrix for use with shapviz.

However, when I am using this workflow now with shapviz, the extracted XGB.Booster models no longer produce the same predictions as a call to predict on the fitted model object, and subsequently the SHAP values are completely off. As an example, my model generally has numeric targets between 0-20 (with large positive outliers), and the largest mean SHAP values from shapviz are 27-28 in some instances. Looking at force plots and waterfall plots show that the extracted model is predicting values (F(X)) from 89-150 for my model outputs. However, E(F(X)) for the models -- ~5.5 -- matches what I see in my data. Tests with the predict function show that it is returning large predictions from the extracted model, matching the F(X) shown in shapviz.

Comparing predict on the fit model from tidymodels vs predict on the extracted model using a freshly run model, with the exact workflow used to fit the model to bake the data matrix I use for predict, yields the same results -- predict(fit) has predicted values in the range of the data, predict(extracted) has predicted values massively larger than the data.

Specifications: I am using XGBoost in regression mode, with a non-negative integer as my target. For predictors, I have between 200-205 variables of a variety of types including integers, doubles, and factors. Some data for these are missing, with missings represented as NA. The only data preparation step in my tidymodels workflow is to convert the factor predictors to one-hot-encoded dummy variables; I am not normalizing or otherwise transforming any of the variables.

I have tried to create a reproducible example using the iris dataset, but it is not showing the same behavior. Unfortunately, the data I am using for my models is under HIPAA and I can't use a subset of it to create a reproducible example.