I thought I would share this with the community.
vtreat
is a package for systematically preparing data for supervised machine learning tasks such as classification or regression. vtreat
designs a data transform that takes in messy data (with missing values, and high cardinality categorical variables) and delivers transformed data that is purely numeric and with no missing values. The transformation is designed to try and retain almost all of the information relating the explanatory variables to the dependent variable in a model usable format. This transformation can be saved and then applied to future test or application data.
If you aren’t using something like vtreat
in your data science projects: you are really missing out (and making more work for yourself).
We have some links to new documentation here.