I'm doing some Multiple Hypothesis Testing of regression when I cluster the standard error over one of the variables. Without the clustering I'm getting the correct answer. However, when I'm doing the same analysis with clustering results seems to be wrong (Compared to Stata's results).
It is worth mentioning that the clustered SE are correct, as well as the F-statistic.
Any thoughts or ideas?
I attach the reprex output:
library(datasets)
library(car)
#> Loading required package: carData
library(multiwayvcov)
db <- airquality
lm<-lm(Ozone~Solar.R+Wind+Temp,data=db)
sqrt(diag(cluster.vcov(lm,~Month))) # These are exactly Stata's SE when I cluster by Months
#> (Intercept) Solar.R Wind Temp
#> 21.30110653 0.03345001 1.18106275 0.15831068
linearHypothesis(lm,c("Wind= 0","Solar.R=0"),vcov=cluster.vcov(lm,~Month)) #clustered
#> Linear hypothesis test
#>
#> Hypothesis:
#> Wind = 0
#> Solar.R = 0
#>
#> Model 1: restricted model
#> Model 2: Ozone ~ Solar.R + Wind + Temp
#>
#> Note: Coefficient covariance matrix supplied.
#>
#> Res.Df Df F Pr(>F)
#> 1 109
#> 2 107 2 3.9834 0.02145 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linearHypothesis(lm,c("Wind= 0","Solar.R=0")) #non-clustered
#> Linear hypothesis test
#>
#> Hypothesis:
#> Wind = 0
#> Solar.R = 0
#>
#> Model 1: restricted model
#> Model 2: Ozone ~ Solar.R + Wind + Temp
#>
#> Res.Df RSS Df Sum of Sq F Pr(>F)
#> 1 109 62367
#> 2 107 48003 2 14365 16.01 8.27e-07 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here is the Stata's output as a compare.
. import excel "C:\Dropbox\airquality.xlsx", sheet("Sheet1") cellrange(B1:G154) firstrow clear
. reg Ozone SolarR Wind Temp ,cluster(Month)
Linear regression Number of obs = 111
F( 3, 4) = 81.23
Prob > F = 0.0005
R-squared = 0.6059
Root MSE = 21.181
(Std. Err. adjusted for 5 clusters in Month)
------------------------------------------------------------------------------
| Robust
Ozone | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
SolarR | .0598206 .03345 1.79 0.148 -.0330515 .1526927
Wind | -3.333591 1.181063 -2.82 0.048 -6.612747 -.0544354
Temp | 1.652093 .1583107 10.44 0.000 1.212552 2.091634
_cons | -64.34208 21.30111 -3.02 0.039 -123.4834 -5.200726
------------------------------------------------------------------------------
. test Wind=SolarR=0
( 1) - SolarR + Wind = 0
( 2) Wind = 0
F( 2, 4) = 3.98
Prob > F = 0.1117
. reg Ozone SolarR Wind Temp
Source | SS df MS Number of obs = 111
-------------+------------------------------ F( 3, 107) = 54.83
Model | 73799.1195 3 24599.7065 Prob > F = 0.0000
Residual | 48002.7904 107 448.62421 R-squared = 0.6059
-------------+------------------------------ Adj R-squared = 0.5948
Total | 121801.91 110 1107.29009 Root MSE = 21.181
------------------------------------------------------------------------------
Ozone | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
SolarR | .0598206 .0231865 2.58 0.011 .0138561 .1057851
Wind | -3.333591 .6544071 -5.09 0.000 -4.630877 -2.036306
Temp | 1.652093 .2535298 6.52 0.000 1.1495 2.154686
_cons | -64.34208 23.05472 -2.79 0.006 -110.0454 -18.63878
------------------------------------------------------------------------------
. test Wind=SolarR=0
( 1) - SolarR + Wind = 0
( 2) Wind = 0
F( 2, 107) = 16.01
Prob > F = 0.0000