Hi everyone,
I'm currently doing an assignment for my statistic course, and the question is like this:
Use the MS Excel file beer.xlsx to answer the questions below. In this file,
q = litres of beer consumed by a household,
pb = price of beer ()
pl = price of other liquor ()
pr = price of other goods and services (everything but alcohol, index)
i = household income ($/year).
(1) Run the OLS linear regression (to estimate the coefficients) as given below. Report the estimated equation. Attach your R summary.
q ̂i = β ̂0 + β ̂1pbi+ β ̂2pli + β ̂3pri + β ̂4ii (*)
(2)Is there evidence for multicollinearity? Attach the correlation (hint: function cor) table. Point out which numbers in the table indicate multicollinearity if any.
and my answers are like these:
(1)
library(readxl)
beer=read_excel("~/Desktop/ECON 333/beer.xlsx")
beer
# A tibble: 30 x 5
q pb pl pr i
<dbl> <dbl> <dbl> <dbl> <dbl>
1 81.7 1.78 6.95 1.11 25088
2 56.9 2.27 7.32 0.67 26561
3 64.1 2.21 6.96 0.83 25510
4 65.4 2.15 7.18 0.75 27158
5 64.1 2.26 7.46 1.06 27162
6 58.1 2.49 7.47 1.1 27583
7 61.7 2.52 7.88 1.09 28235
8 65.3 2.46 7.88 1.18 29413
9 57.8 2.54 7.97 0.88 28713
10 63.5 2.72 7.96 1.3 30000
# … with 20 more rows
lm (q~pb+pl+pr*i,beer)
Call:
lm(formula = q ~ pb + pl + pr * i, data = beer)
Coefficients:
(Intercept) pb pl pr i pr:i
6.632e+01 -2.396e+01 -4.846e+00 2.710e+01 2.718e-03 -4.384e-04
a = lm(q~pb+pl+pr*i,beer)
summary (a)
Call:
lm(formula = q ~ pb + pl + pr * i, data = beer)
Residuals:
Min 1Q Median 3Q Max
-9.8503 -1.7605 -0.1913 2.0372 5.6603
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.632e+01 2.819e+01 2.353 0.027181 *
pb -2.396e+01 5.488e+00 -4.366 0.000208 ***
pl -4.846e+00 4.064e+00 -1.192 0.244762
pr 2.710e+01 1.977e+01 1.371 0.183043
i 2.718e-03 1.259e-03 2.159 0.041027 *
pr:i -4.384e-04 5.973e-04 -0.734 0.470066
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.603 on 24 degrees of freedom
Multiple R-squared: 0.826, Adjusted R-squared: 0.7898
F-statistic: 22.79 on 5 and 24 DF, p-value: 2.145e-08
volume = beer$q
price1 = beer$pb
price2 = beer$pl
price3 = beer$pr
cor (volume, price1)
[1] -0.8038513
cor (volume, price2)
[1] -0.7616717
cor (volume, price3)
[1] -0.4434794
(2)
```{r}
fit = lm(log(q)~pb+pl+pr*i,beer)
summary(fit)
Call:
lm(formula = log(q) ~ pb + pl + pr * i, data = beer)
Residuals:
Min 1Q Median 3Q Max
-0.190391 -0.027866 0.001757 0.039750 0.083529
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.174e+00 4.988e-01 8.368 1.41e-08 ***
pb -3.936e-01 9.711e-02 -4.053 0.000461 ***
pl -8.509e-02 7.191e-02 -1.183 0.248290
pr 4.705e-01 3.497e-01 1.345 0.191088
i 4.703e-05 2.227e-05 2.112 0.045288 *
pr:i -8.421e-06 1.057e-05 -0.797 0.433400
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.06374 on 24 degrees of freedom
Multiple R-squared: 0.8106, Adjusted R-squared: 0.7712
F-statistic: 20.55 on 5 and 24 DF, p-value: 5.783e-08
but now I'm confused if I need to use both data(or I might wrong for one of them) and how to find the evidence for multicollinearity.
Thank you for your help.
Jennifer