Feature engineering with proportional data

I'm working on a ML project where some of the features are population demographics, reported as a % of the total population. Looking at some of the demographic distributions, it looks like there may be some non-linear relationships with the predicted variable - aka, things get wonky near 0% & 100% :


Are there any recommended first-steps for feature engineering w/data on a 0-1 scale? My first thought is to create new features by running the demographic data through the logit function to map the distribution from 0 - 1 to -inf and inf, but I'd happily welcome other ideas. I did a cursory google search & didn't come across any general info quickly.


Thanks ahead of time!

You might check out the approach we take in the dimensionality reduction chapter of TMwR, where some predictors are ratios.

Thanks @julia ! This helps a ton!!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.