not correct results of Multiple Hypothesis Testing with clustering

I'm doing some Multiple Hypothesis Testing of regression when I cluster the standard error over one of the variables. Without the clustering I'm getting the correct answer. However, when I'm doing the same analysis with clustering results seems to be wrong (Compared to Stata's results).
It is worth mentioning that the clustered SE are correct, as well as the F-statistic.
Any thoughts or ideas?

I attach the reprex output:

library(datasets)
library(car)
#> Loading required package: carData
library(multiwayvcov)
   
db <- airquality

lm<-lm(Ozone~Solar.R+Wind+Temp,data=db) 
sqrt(diag(cluster.vcov(lm,~Month))) # These are exactly Stata's SE when I cluster by Months
#> (Intercept)     Solar.R        Wind        Temp 
#> 21.30110653  0.03345001  1.18106275  0.15831068
               
linearHypothesis(lm,c("Wind= 0","Solar.R=0"),vcov=cluster.vcov(lm,~Month)) #clustered
#> Linear hypothesis test
#> 
#> Hypothesis:
#> Wind = 0
#> Solar.R = 0
#> 
#> Model 1: restricted model
#> Model 2: Ozone ~ Solar.R + Wind + Temp
#> 
#> Note: Coefficient covariance matrix supplied.
#> 
#>   Res.Df Df      F  Pr(>F)  
#> 1    109                    
#> 2    107  2 3.9834 0.02145 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linearHypothesis(lm,c("Wind= 0","Solar.R=0")) #non-clustered
#> Linear hypothesis test
#> 
#> Hypothesis:
#> Wind = 0
#> Solar.R = 0
#> 
#> Model 1: restricted model
#> Model 2: Ozone ~ Solar.R + Wind + Temp
#> 
#>   Res.Df   RSS Df Sum of Sq     F   Pr(>F)    
#> 1    109 62367                                
#> 2    107 48003  2     14365 16.01 8.27e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here is the Stata's output as a compare.


. import excel "C:\Dropbox\airquality.xlsx", sheet("Sheet1") cellrange(B1:G154) firstrow clear

. reg Ozone SolarR Wind Temp ,cluster(Month)

Linear regression                                      Number of obs =     111
                                                       F(  3,     4) =   81.23
                                                       Prob > F      =  0.0005
                                                       R-squared     =  0.6059
                                                       Root MSE      =  21.181

                                  (Std. Err. adjusted for 5 clusters in Month)
------------------------------------------------------------------------------
             |               Robust
       Ozone |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      SolarR |   .0598206     .03345     1.79   0.148    -.0330515    .1526927
        Wind |  -3.333591   1.181063    -2.82   0.048    -6.612747   -.0544354
        Temp |   1.652093   .1583107    10.44   0.000     1.212552    2.091634
       _cons |  -64.34208   21.30111    -3.02   0.039    -123.4834   -5.200726
------------------------------------------------------------------------------

. test Wind=SolarR=0

 ( 1)  - SolarR + Wind = 0
 ( 2)  Wind = 0

       F(  2,     4) =    3.98
            Prob > F =    0.1117

. reg Ozone SolarR Wind Temp

      Source |       SS       df       MS              Number of obs =     111
-------------+------------------------------           F(  3,   107) =   54.83
       Model |  73799.1195     3  24599.7065           Prob > F      =  0.0000
    Residual |  48002.7904   107   448.62421           R-squared     =  0.6059
-------------+------------------------------           Adj R-squared =  0.5948
       Total |   121801.91   110  1107.29009           Root MSE      =  21.181

------------------------------------------------------------------------------
       Ozone |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      SolarR |   .0598206   .0231865     2.58   0.011     .0138561    .1057851
        Wind |  -3.333591   .6544071    -5.09   0.000    -4.630877   -2.036306
        Temp |   1.652093   .2535298     6.52   0.000       1.1495    2.154686
       _cons |  -64.34208   23.05472    -2.79   0.006    -110.0454   -18.63878
------------------------------------------------------------------------------

. test Wind=SolarR=0

 ( 1)  - SolarR + Wind = 0
 ( 2)  Wind = 0

       F(  2,   107) =   16.01
            Prob > F =    0.0000


Your R code reproduces perfectly, so you must be using data(airquality) for db and

> nrow(airquality)
[1] 153

I'm not a speaker of strata, but its data source is a spreadsheet and if "Number of obs = 111" refers dimension of the data along the y axis, it looks like it's missing 42 observations.

Why don't you export data(airquality) to cvs, read it into spreadsheet and see if you still have a disconnect?

Hi, Thx for the feedback.
However this does not concern me as I have the same num of obs both in 'Stata' and in 'R'.

Anyway, I found the reason for the difference between the two software.

The F-statistic is the same but the p value isn't, so the certical value must be calculated in respect to different Degrees of Freedom.
As specified in the results it self. Once I have notice that I can correct it manually.

With 'pf()' function.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.