Feature engineering with proportional data

markjrieke · September 28, 2021, 7:13pm

I'm working on a ML project where some of the features are population demographics, reported as a % of the total population. Looking at some of the demographic distributions, it looks like there may be some non-linear relationships with the predicted variable - aka, things get wonky near 0% & 100% :

Are there any recommended first-steps for feature engineering w/data on a 0-1 scale? My first thought is to create new features by running the demographic data through the logit function to map the distribution from 0 - 1 to -inf and inf, but I'd happily welcome other ideas. I did a cursory google search & didn't come across any general info quickly.

Thanks ahead of time!

julia · September 29, 2021, 10:51pm

You might check out the approach we take in the dimensionality reduction chapter of TMwR, where some predictors are ratios.

markjrieke · September 30, 2021, 2:06pm

Thanks @julia ! This helps a ton!!

system · October 21, 2021, 2:06pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.