How tidymodels handles missing data while training and predicting?

marioem · September 29, 2025, 8:59am

Apparently ranger requires missing data to be imputed before training, and during prediction time uses partial voting when missing data is present.
I've trained a ranger RF classification model workflow (tune_bayes) with minimal preprocessing in the recipe - only re-leveling the outcome. Numerical predictors contained about 30% of missing values (all missing values were in the same rows across predictors). Model trained well, with decent result.
How this missing data was handled? Was it silently dropped or imputed? I extracted the analysis splits from the workflow, and the NAs are still there. Predicting (with NAs) on an extracted engine gave exactly the same results as were present in extract_predictions, suggesting ranger's native partial voting was employed.
if the NAs are not transparently passed to the engine, it would be also great to have this documented in parsnip supported engines' table.

Thanks,
Mariusz