Multiple logistic regression model + risk scores to calculate employee attrition

It's worth getting a copy of Applied Logistic Regression 3rd Edition by David W. Hosmer Jr., Stanley Lemeshow and Rodney X. Sturdivant (2009), especially since you have a stats background. Unfortunately, it's code agnostic, with no examples in any language. I'm working to remedy that in R see, e.g., but it's slow going.

The rule of thumb for categorical variables is to treat them as continuous if there are more than a dozen or so, and to create dummy binary variables if they are not.

For example, if a variable can take on one of three values, say, red, yellow, blue you would create three substitute binary variables of those names.

The risk metric that comes out of a logistic regression is the odds ratio, which is just what it sounds like. An odds ratio of 0 means that the outcome, Y is equally likely with or without the independent variables X_i ...X_n. OR > 0 means more likely, 1\frac 1 2 one and a half times more likely, -\frac 1 2, half as likely, etc.

If you have enough historical data, you'll want to partition it into a training set and validation set and use the goodness of fit tests to see how well the model does in practice.

