Multiple regression: p-values for unstandarized data differ from p-values for standardized data

Hi there,

I calculated a multiple regression with unstandardized data using lm and got those values:

Call:
lm(formula = reg.modeldef2d, data = dta2)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.48129 -0.47499 -0.03837  0.43007  1.32839 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)   
(Intercept)                -1.96183    0.68485  -2.865  0.00535 **
WPR.Teiler.theta            0.11308    0.13375   0.845  0.40041   
WPR.Prim.theta             -0.16453    0.09212  -1.786  0.07794 . 
SD.math                     0.38957    0.13936   2.795  0.00650 **
SR3.theta                   0.20984    0.10707   1.960  0.05355 . 
SW2.Teiler.theta            0.05043    0.09504   0.531  0.59716   
SR3.theta:SW2.Teiler.theta  0.22049    0.08912   2.474  0.01550 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6724 on 79 degrees of freedom
Multiple R-squared:  0.4074,	Adjusted R-squared:  0.3624 
F-statistic: 9.052 on 6 and 79 DF,  p-value: 1.583e-07

Then I standardized the data using scale() and calculated the very same regression:

Call:
lm(formula = reg.modeldef2d, data = dta2_std)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.75916 -0.56409 -0.04557  0.51075  1.57757 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)   
(Intercept)                -0.02616    0.08675  -0.302   0.7638   
WPR.Teiler.theta            0.08763    0.10365   0.845   0.4004   
WPR.Prim.theta             -0.17215    0.09639  -1.786   0.0779 . 
SD.math                     0.36673    0.13119   2.795   0.0065 **
SR3.theta                   0.23590    0.11639   2.027   0.0461 * 
SW2.Teiler.theta            0.05409    0.10077   0.537   0.5930   
SR3.theta:SW2.Teiler.theta  0.21524    0.08700   2.474   0.0155 * 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7985 on 79 degrees of freedom
Multiple R-squared:  0.4074,	Adjusted R-squared:  0.3624 
F-statistic: 9.052 on 6 and 79 DF,  p-value: 1.583e-07

The p-values are not the same! Also, the intercept isn't at 0 (which it should with standardized data, shouldn't it?)
I checked for NA-values, but none exist.

Is it possible that the accuracy of the R variables is the problem? When rescale the standardized data manually and calculate the differences to the original unstandardized values, not all differences are zero (but small numbers, like E-17).

It's kind of a problem because after the standardization the factor SR3.theta suddenly gets significant...

Thanks for all your help in advance!

Regards, Martin

Did you standardize the left-hand side variable as well as the explanatory variables?

Thanks for the quick reply. Yes, I did, I standardized the whole data frame.

I would double-check that. I realize that sounds strange to say, but if you standardized everything then you are quite right to be puzzled about the intercept. A linear regression goes through the mean of the data. If both the left and right hand sides have mean zero, then the intercept should be zero.

Maybe post more of your code? Or maybe a summary() of both dataframes? (And show what reg.modeldef2d looks like.)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.