I'm running a multiple regression analysis for economics and I accidentally entered by data including the expected sign (i.e. y~ x1 - x2 + x3 -x4 . . .). For some reason my RSq. value was noticeably higher after I did this. Why is this? What is it doing?

Hi @akendri,

The short answer is the minus sign in a formula ignores those variables. In other words, you are dropping `x2`

and `x4`

from the model. See this below:

```
dd <- data.frame(
y = rnorm(100),
x1 = rnorm(100),
x2 = rnorm(100),
x3 = rnorm(100),
x4 = rnorm(100)
)
lm(y ~ x1 - x2 + x3 - x4, dd)
#>
#> Call:
#> lm(formula = y ~ x1 - x2 + x3 - x4, data = dd)
#>
#> Coefficients:
#> (Intercept) x1 x3
#> -0.05084 0.10392 0.08358
lm(y ~ x1 + x3, dd) # same as above
#>
#> Call:
#> lm(formula = y ~ x1 + x3, data = dd)
#>
#> Coefficients:
#> (Intercept) x1 x3
#> -0.05084 0.10392 0.08358
with(dd, model.matrix(y ~ x1 - x2 + x3 - x4))
#> (Intercept) x1 x3
#> 1 1 -0.95119963 0.001864079
#> 2 1 -0.77260624 -0.469920312
#> 3 1 0.31157269 0.435967828
#> [ reached getOption("max.print") -- omitted 97 rows ]
#> attr(,"assign")
#> [1] 0 1 2
```

Thank you! When I run the regression with lm(y ~ x1 - x2 + x3 - x4, dd) I do not receive a value for the intercept but when I run it as lm(y ~ x1 + x3, dd) I do get a value for the intercept (and consequently also have slightly different estimates for the coefficient). Do you know why this is happening? Also my Rsq. and Adj. RSq are about .2 lower in the latter regression (the one with an intercept)--do you know why this may be?

I am not able to reproduce your issue of not having an intercept estimate. Could you provide some code that reproduces this issue?

I think I may have figured it out. It looks like the model was just dropping the intercept. When I use lm( y~ x1 + x3 -1, dd) the results match the other model.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.