detecting multicollinearity

Hi everyone,

I'm currently doing an assignment for my statistic course, and the question is like this:

Use the MS Excel file beer.xlsx to answer the questions below. In this file,
q = litres of beer consumed by a household,
pb = price of beer () pl = price of other liquor ()
pr = price of other goods and services (everything but alcohol, index)
i = household income ($/year).

(1) Run the OLS linear regression (to estimate the coefficients) as given below. Report the estimated equation. Attach your R summary.
q ̂i = β ̂0 + β ̂1pbi+ β ̂2pli + β ̂3pri + β ̂4ii (*)

(2)Is there evidence for multicollinearity? Attach the correlation (hint: function cor) table. Point out which numbers in the table indicate multicollinearity if any.

and my answers are like these:

(1)

library(readxl)
beer=read_excel("~/Desktop/ECON 333/beer.xlsx")
beer
# A tibble: 30 x 5
       q    pb    pl    pr     i
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1  81.7  1.78  6.95  1.11 25088
 2  56.9  2.27  7.32  0.67 26561
 3  64.1  2.21  6.96  0.83 25510
 4  65.4  2.15  7.18  0.75 27158
 5  64.1  2.26  7.46  1.06 27162
 6  58.1  2.49  7.47  1.1  27583
 7  61.7  2.52  7.88  1.09 28235
 8  65.3  2.46  7.88  1.18 29413
 9  57.8  2.54  7.97  0.88 28713
10  63.5  2.72  7.96  1.3  30000
# … with 20 more rows

lm (q~pb+pl+pr*i,beer)
Call:
lm(formula = q ~ pb + pl + pr * i, data = beer)

Coefficients:
(Intercept)           pb           pl           pr            i         pr:i  
  6.632e+01   -2.396e+01   -4.846e+00    2.710e+01    2.718e-03   -4.384e-04  
  
a = lm(q~pb+pl+pr*i,beer)
summary (a)
Call:
lm(formula = q ~ pb + pl + pr * i, data = beer)

Residuals:
    Min      1Q  Median      3Q     Max 
-9.8503 -1.7605 -0.1913  2.0372  5.6603 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.632e+01  2.819e+01   2.353 0.027181 *  
pb          -2.396e+01  5.488e+00  -4.366 0.000208 ***
pl          -4.846e+00  4.064e+00  -1.192 0.244762    
pr           2.710e+01  1.977e+01   1.371 0.183043    
i            2.718e-03  1.259e-03   2.159 0.041027 *  
pr:i        -4.384e-04  5.973e-04  -0.734 0.470066    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.603 on 24 degrees of freedom
Multiple R-squared:  0.826,	Adjusted R-squared:  0.7898 
F-statistic: 22.79 on 5 and 24 DF,  p-value: 2.145e-08

volume = beer$q
price1 = beer$pb
price2 = beer$pl
price3 = beer$pr
cor (volume, price1)
[1] -0.8038513
cor (volume, price2)
[1] -0.7616717
cor (volume, price3)
[1] -0.4434794




(2) 
```{r}
fit = lm(log(q)~pb+pl+pr*i,beer)
summary(fit)

Call:
  lm(formula = log(q) ~ pb + pl + pr * i, data = beer)

Residuals:
  Min        1Q    Median        3Q       Max 
-0.190391 -0.027866  0.001757  0.039750  0.083529 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.174e+00  4.988e-01   8.368 1.41e-08 ***
  pb          -3.936e-01  9.711e-02  -4.053 0.000461 ***
  pl          -8.509e-02  7.191e-02  -1.183 0.248290    
pr           4.705e-01  3.497e-01   1.345 0.191088    
i            4.703e-05  2.227e-05   2.112 0.045288 *  
  pr:i        -8.421e-06  1.057e-05  -0.797 0.433400    
---
  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06374 on 24 degrees of freedom
Multiple R-squared:  0.8106,	Adjusted R-squared:  0.7712 
F-statistic: 20.55 on 5 and 24 DF,  p-value: 5.783e-08


but now I'm confused if I need to use both data(or I might wrong for one of them) and how to find the evidence for multicollinearity.

Thank you for your help.


Jennifer

Hi, welcome!

Please have a look to our homework policy, homework inspired questions are welcome but they should not include verbatim instructions from your course.

ah sorry for that! thank you for telling me!

Without doing the assignment, I think it is ok to say:

It is the independent variables (not the dependent variable) that we need to ask if they are highly correlated. So check all possible pairs of independent variables for large (in absolute value) correlation values. If a pair of independent variables is highly correlated, consider removing one of them from the regression, particularly the one with the larger P-value.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.