Need for Feature Scaling in Linear Regression

Hello,
I am becoming familiar with different statistical models starting with linear regression (simple and multiple). I understand that many statistical models are sensitive to scaling (those that are distance based).

There are different types of scaling (standardization, centering, normalization, etc.). Scaling can certainly help with visualizations if one variable has a range that is way larger than the other (ex: scatterplot).

My question: does linear regression (simple or multiple) work better if the explanatory variables X and the response variable Y are first scaled? Or is scaling only necessary if the ranges of the X and Y variables are VERY different? And what type of scaling would be the most appropriate?

I understand there is not a single scaling solution but I wonder what is the best way to think and approach scaling...

Thank you!

The types of scaling you mention make linear (technically affine) transformations of variables. In a linear regression, this makes no substantive difference. One does have to be careful about interpretation of coefficients of course. If one divides a independent variable by 2, then the coefficient on that variable will exactly double.

Sometimes scaling can help with numerical properties in the calculations behind the linear regression. With modern software--like R--this is very unlikely to be an issue.

It depends on the estimation method. For ordinary least squares, there is no requirement to do so.

For any penalized model (e.g., glmnet, lasso, lars, ridge regression, principle component regression, etc.), you should have the predictors on the same scale.

In tidymodels, the man pages for each model type tells you when you should use such a method:

Preprocessing requirements

Predictors should have the same scale. One way to achieve this is to center and scale each so that each predictor has mean zero and a variance of one.

I hope this is a typo, with the word "no" omitted.

Correct. I've edited it.

1 Like

This would be a nice query to test with some sample data.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.