How tidymodels handles missing data while training and predicting?

Apparently ranger requires missing data to be imputed before training, and during prediction time uses partial voting when missing data is present.
I've trained a ranger RF classification model workflow (tune_bayes) with minimal preprocessing in the recipe - only re-leveling the outcome. Numerical predictors contained about 30% of missing values (all missing values were in the same rows across predictors). Model trained well, with decent result.
How this missing data was handled? Was it silently dropped or imputed? I extracted the analysis splits from the workflow, and the NAs are still there. Predicting (with NAs) on an extracted engine gave exactly the same results as were present in extract_predictions, suggesting ranger's native partial voting was employed.
if the NAs are not transparently passed to the engine, it would be also great to have this documented in parsnip supported engines' table.

Thanks,
Mariusz

We can do that.

?ranger::ranger has:

`na.action: Handling of missing values. Set to "na.learn" to internally handle missing values (default, see below), to "na.omit" to omit observations with missing values and to "na.fail" to stop if missing values are found.

then later...

Missing values can be internally handled by setting na.action = "na.learn" (default), by omitting observations with missing values with na.action = "na.omit" or by stopping if missing values are found with na.action = "na.fail". With na.action = "na.learn", in each node either all missings go left or all missings go right. The direction is chosen based on the split criterion value (i.e., decrease of impurity). For prediction, this direction is saved as the "default" direction. If a missing occurs in prediction at a node where there is no default direction, it goes left.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.