not correct results of Multiple Hypothesis Testing with clustering

I'm doing some Multiple Hypothesis Testing of regression when I cluster the standard error over one of the variables. Without the clustering I'm getting the correct answer. However, when I'm doing the same analysis with clustering results seems to be wrong (Compared to Stata's results).
It is worth mentioning that the clustered SE are correct, as well as the F-statistic.
Any thoughts or ideas?

I attach the reprex output:

``````library(datasets)
library(car)
library(multiwayvcov)

db <- airquality

lm<-lm(Ozone~Solar.R+Wind+Temp,data=db)
sqrt(diag(cluster.vcov(lm,~Month))) # These are exactly Stata's SE when I cluster by Months
#> (Intercept)     Solar.R        Wind        Temp
#> 21.30110653  0.03345001  1.18106275  0.15831068

linearHypothesis(lm,c("Wind= 0","Solar.R=0"),vcov=cluster.vcov(lm,~Month)) #clustered
#> Linear hypothesis test
#>
#> Hypothesis:
#> Wind = 0
#> Solar.R = 0
#>
#> Model 1: restricted model
#> Model 2: Ozone ~ Solar.R + Wind + Temp
#>
#> Note: Coefficient covariance matrix supplied.
#>
#>   Res.Df Df      F  Pr(>F)
#> 1    109
#> 2    107  2 3.9834 0.02145 *
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
linearHypothesis(lm,c("Wind= 0","Solar.R=0")) #non-clustered
#> Linear hypothesis test
#>
#> Hypothesis:
#> Wind = 0
#> Solar.R = 0
#>
#> Model 1: restricted model
#> Model 2: Ozone ~ Solar.R + Wind + Temp
#>
#>   Res.Df   RSS Df Sum of Sq     F   Pr(>F)
#> 1    109 62367
#> 2    107 48003  2     14365 16.01 8.27e-07 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````

Here is the Stata's output as a compare.

``````
. import excel "C:\Dropbox\airquality.xlsx", sheet("Sheet1") cellrange(B1:G154) firstrow clear

. reg Ozone SolarR Wind Temp ,cluster(Month)

Linear regression                                      Number of obs =     111
F(  3,     4) =   81.23
Prob > F      =  0.0005
R-squared     =  0.6059
Root MSE      =  21.181

(Std. Err. adjusted for 5 clusters in Month)
------------------------------------------------------------------------------
|               Robust
Ozone |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
SolarR |   .0598206     .03345     1.79   0.148    -.0330515    .1526927
Wind |  -3.333591   1.181063    -2.82   0.048    -6.612747   -.0544354
Temp |   1.652093   .1583107    10.44   0.000     1.212552    2.091634
_cons |  -64.34208   21.30111    -3.02   0.039    -123.4834   -5.200726
------------------------------------------------------------------------------

. test Wind=SolarR=0

( 1)  - SolarR + Wind = 0
( 2)  Wind = 0

F(  2,     4) =    3.98
Prob > F =    0.1117

. reg Ozone SolarR Wind Temp

Source |       SS       df       MS              Number of obs =     111
-------------+------------------------------           F(  3,   107) =   54.83
Model |  73799.1195     3  24599.7065           Prob > F      =  0.0000
Residual |  48002.7904   107   448.62421           R-squared     =  0.6059
Total |   121801.91   110  1107.29009           Root MSE      =  21.181

------------------------------------------------------------------------------
Ozone |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
SolarR |   .0598206   .0231865     2.58   0.011     .0138561    .1057851
Wind |  -3.333591   .6544071    -5.09   0.000    -4.630877   -2.036306
Temp |   1.652093   .2535298     6.52   0.000       1.1495    2.154686
_cons |  -64.34208   23.05472    -2.79   0.006    -110.0454   -18.63878
------------------------------------------------------------------------------

. test Wind=SolarR=0

( 1)  - SolarR + Wind = 0
( 2)  Wind = 0

F(  2,   107) =   16.01
Prob > F =    0.0000

``````

Your `R` code reproduces perfectly, so you must be using data(airquality) for db and

``````> nrow(airquality)
[1] 153
``````

I'm not a speaker of `strata`, but its data source is a spreadsheet and if "Number of obs = 111" refers dimension of the data along the y axis, it looks like it's missing 42 observations.

Why don't you export data(airquality) to cvs, read it into spreadsheet and see if you still have a disconnect?

Hi, Thx for the feedback.
However this does not concern me as I have the same num of obs both in 'Stata' and in 'R'.

Anyway, I found the reason for the difference between the two software.

The F-statistic is the same but the p value isn't, so the certical value must be calculated in respect to different Degrees of Freedom.
As specified in the results it self. Once I have notice that I can correct it manually.

With 'pf()' function.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.