The xgboost documentation cautions against using saveRDS
in favor of xgb.save
, for storing trained models for future scoring. Introduction to Model IO — xgboost 1.6.1 documentation
We guarantee backward compatibility for models but not for memory snapshots.
Models (trees and objective) use a stable representation, so that models produced in earlier versions of XGBoost are accessible in later versions of XGBoost. If you’d like to store or archive your model for long-term storage, use
save_model
(Python) andxgb.save
(R).
On the other hand, memory snapshot (serialisation) captures many stuff internal to XGBoost, and its format is not stable and is subject to frequent changes. Therefore, memory snapshot is suitable for checkpointing only, where you persist the complete snapshot of the training configurations so that you can recover robustly from possible failures and resume the training process. Loading memory snapshot generated by an earlier version of XGBoost may result in errors or undefined behaviors. If a model is persisted with
pickle.dump
(Python) orsaveRDS
(R), then the model may not be accessible in later versions of XGBoost.
Does it mean we should avoid serializing tydymodels workflow objects as well, when there is a possibility of an upgraded version of xgboost being used during scoring in the future?