Need for Feature Scaling in Linear Regression

Confused2023 · February 22, 2023, 2:20pm

Hello,
I am becoming familiar with different statistical models starting with linear regression (simple and multiple). I understand that many statistical models are sensitive to scaling (those that are distance based).

There are different types of scaling (standardization, centering, normalization, etc.). Scaling can certainly help with visualizations if one variable has a range that is way larger than the other (ex: scatterplot).

My question: does linear regression (simple or multiple) work better if the explanatory variables X and the response variable Y are first scaled? Or is scaling only necessary if the ranges of the X and Y variables are VERY different? And what type of scaling would be the most appropriate?

I understand there is not a single scaling solution but I wonder what is the best way to think and approach scaling...

Thank you!

startz · February 22, 2023, 5:24pm

The types of scaling you mention make linear (technically affine) transformations of variables. In a linear regression, this makes no substantive difference. One does have to be careful about interpretation of coefficients of course. If one divides a independent variable by 2, then the coefficient on that variable will exactly double.

Sometimes scaling can help with numerical properties in the calculations behind the linear regression. With modern software--like R--this is very unlikely to be an issue.

Max · February 22, 2023, 10:47pm

It depends on the estimation method. For ordinary least squares, there is no requirement to do so.

For any penalized model (e.g., glmnet, lasso, lars, ridge regression, principle component regression, etc.), you should have the predictors on the same scale.

In tidymodels, the man pages for each model type tells you when you should use such a method:

Preprocessing requirements

Predictors should have the same scale. One way to achieve this is to center and scale each so that each predictor has mean zero and a variance of one.

startz · February 23, 2023, 12:08am

I hope this is a typo, with the word "no" omitted.

Max · February 27, 2023, 7:02pm

Correct. I've edited it.

fcas80 · February 27, 2023, 7:14pm

This would be a nice query to test with some sample data.

system · March 20, 2023, 7:14pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.