As a bit of a follow up to my previous question, I've seen disagreements online on whether I should center/scale my dummy variables prior to modeling. (look at the reprexes in the above link for an example). Andrew Gelman seems to say that I shouldn't, but Rob Tibshirani seems to say that I should.
Does anyone have any experience in this? Would it differ on whether I was using glmnet/LASSO vs keras/neural network?
(One of my favorite things about tree-based models like xgboost is that I don't have to think about these issues as much )
Yes, when the model requires the parameters to be on the same scale
regularized models (glmnet and like) have penalties on the sum of the slope values
nearest neighbor models use distance values and kernel methods (e.g. SVMs) use dot products
neural networks usually initialize using random numbers and assume the same scale of predictors
PLS models chase covariance and assume that the variances are the same.
and so on.
There is a decent argument to scale them all to a variance of two but, regardless, for some models you will do them harm if you do not normalize them as needed.
Agreed! Low maintenance is the way to go initially.