I need some help estimating a model using cluster robust standard errors with lmtest
and sandwich
package. I'm conducting a longitudinal ancova using weight as dependent variable. It's a public health study where participants were recruited in different doctors offices (variable praxisid) leading to a clustered data structure. My initial model accounted for clustered data by including praxisid as a factor to control for differences between different clusters/doctors offices, then using robust standard errors (vcovCL argument cluster = ~ praxisid) for proper estimation of the standard erros. This is the initial model (gewicht_fu2 = weight at follow up, gruppe = group, gewicht = weight, alter_unt = age at measurement, geschlecht = se x, praxisid = ID of doctors office).``
` ` ancova_model_base = lm(gewicht_fu2 ~ gruppe + gewicht + alter_unt + geschlecht + factor(praxisid), data=data)
> clustered_se_base <- vcovCL(ancova_model_base, cluster = ~ praxisid, type = "HC1")
> results_base <- coeftest(ancova_model_base, vcov = clustered_se_base)
> print(results_base)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 24.128480 7.122828 3.3875 0.0007542 ***
gruppe2 3.931860 0.359936 10.9238 < 2.2e-16 ***
gewicht 0.921221 0.023070 39.9319 < 2.2e-16 ***
alter_unt -0.227253 0.079054 -2.8747 0.0041961 **
geschlecht -1.164494 0.665107 -1.7508 0.0805138 .
factor(praxisid)0102 -5.559858 0.530346 -10.4834 < 2.2e-16 ***
factor(praxisid)0103 -4.135628 0.413729 -9.9960 < 2.2e-16 ***
factor(praxisid)0104 -4.728151 0.542333 -8.7182 < 2.2e-16 ***
factor(praxisid)0105 2.299247 0.141410 16.2594 < 2.2e-16 ***
factor(praxisid)0106 -0.342249 0.174957 -1.9562 0.0509321 .
factor(praxisid)0107 0.862412 0.105880 8.1452 2.417e-15 ***
factor(praxisid)0108 -3.177997 0.417356 -7.6146 1.110e-13 ***
factor(praxisid)0109 -3.980172 0.451470 -8.8160 < 2.2e-16 ***
factor(praxisid)0110 -6.397870 0.546846 -11.6996 < 2.2e-16 ***
factor(praxisid)0111 -0.887394 0.250536 -3.5420 0.0004298 ***
factor(praxisid)0112 1.317657 0.052704 25.0013 < 2.2e-16 ***
factor(praxisid)0113 -1.247635 0.165212 -7.5517 1.724e-13 ***
factor(praxisid)0114 -4.253447 0.257481 -16.5195 < 2.2e-16 ***
factor(praxisid)0115 -1.335301 0.202419 -6.5967 9.638e-11 ***
factor(praxisid)0116 -4.269958 0.330097 -12.9355 < 2.2e-16 ***
factor(praxisid)0117 -4.789319 0.576171 -8.3123 6.949e-16 ***
factor(praxisid)0118 2.285457 0.720741 3.1710 0.0016012 **
factor(praxisid)0119 3.369629 0.453080 7.4372 3.817e-13 ***
factor(praxisid)0120 -9.393821 0.331001 -28.3800 < 2.2e-16 ***
factor(praxisid)0121 -7.729174 0.212690 -36.3401 < 2.2e-16 ***
factor(praxisid)0123 2.501104 0.723865 3.4552 0.0005909 ***
factor(praxisid)0124 -5.228440 0.646971 -8.0814 3.868e-15 ***
factor(praxisid)0125 0.662840 0.685234 0.9673 0.3337961
factor(praxisid)0126 -5.809083 1.147089 -5.0642 5.552e-07 ***
factor(praxisid)0128 -8.601282 0.255758 -33.6305 < 2.2e-16 ***
factor(praxisid)0129 -3.823872 0.627748 -6.0914 2.064e-09 ***
factor(praxisid)0130 0.874424 0.286838 3.0485 0.0024069 **
factor(praxisid)0131 -2.550574 0.239055 -10.6694 < 2.2e-16 ***
factor(praxisid)0132 -5.731985 0.301485 -19.0125 < 2.2e-16 ***
factor(praxisid)0133 -7.729465 0.645706 -11.9706 < 2.2e-16 ***
factor(praxisid)0134 -4.198254 0.478053 -8.7820 < 2.2e-16 ***
factor(praxisid)0136 7.605662 0.431907 17.6095 < 2.2e-16 ***
factor(praxisid)0137 -3.344320 0.634414 -5.2715 1.925e-07 ***
factor(praxisid)0138 -1.925985 0.430881 -4.4699 9.451e-06 ***
factor(praxisid)0140 -5.917373 0.140893 -41.9990 < 2.2e-16 ***
factor(praxisid)0141 -5.045173 0.461395 -10.9346 < 2.2e-16 ***
factor(praxisid)0201 -3.445644 1.035772 -3.3266 0.0009358 ***
factor(praxisid)0202 -3.045525 0.407840 -7.4675 3.096e-13 ***
factor(praxisid)0203 1.130478 0.247791 4.5622 6.206e-06 ***
factor(praxisid)0204 2.145188 0.188710 11.3676 < 2.2e-16 ***
factor(praxisid)0205 -7.457343 0.506050 -14.7364 < 2.2e-16 ***
factor(praxisid)0206 1.594615 0.908716 1.7548 0.0798325 .
factor(praxisid)0207 -0.393628 0.191666 -2.0537 0.0404602 *
factor(praxisid)0208 -0.727459 0.249509 -2.9156 0.0036906 **
factor(praxisid)0209 -3.526821 0.630328 -5.5952 3.430e-08 ***
factor(praxisid)0210 -3.441156 0.363318 -9.4715 < 2.2e-16 ***
factor(praxisid)0211 -3.915442 0.465830 -8.4053 3.445e-16 ***
factor(praxisid)0212 -4.617452 0.176922 -26.0988 < 2.2e-16 ***
factor(praxisid)0213 1.557216 0.229727 6.7785 3.048e-11 ***
factor(praxisid)0214 0.158470 0.408048 0.3884 0.6978946
factor(praxisid)0215 -1.022694 0.076523 -13.3646 < 2.2e-16 ***
factor(praxisid)0216 -6.315419 0.729673 -8.6551 < 2.2e-16 ***
factor(praxisid)0218 -11.536564 0.640052 -18.0244 < 2.2e-16 ***
factor(praxisid)0219 -6.950880 0.760465 -9.1403 < 2.2e-16 ***
factor(praxisid)0221 -4.976299 0.437071 -11.3856 < 2.2e-16 ***
factor(praxisid)0222 -6.312480 0.711347 -8.8740 < 2.2e-16 ***
factor(praxisid)0224 -3.865069 0.364790 -10.5953 < 2.2e-16 ***
factor(praxisid)0225 -4.324352 0.715237 -6.0460 2.691e-09 ***
factor(praxisid)0227 8.912021 0.275690 32.3263 < 2.2e-16 ***
factor(praxisid)0229 -6.029149 0.249882 -24.1280 < 2.2e-16 ***
factor(praxisid)0230 -2.141115 0.521678 -4.1043 4.650e-05 ***
factor(praxisid)0302 0.208030 0.101482 2.0499 0.0408297 *
factor(praxisid)0303 -2.306601 0.125588 -18.3664 < 2.2e-16 ***
factor(praxisid)0304 -3.298788 0.311771 -10.5808 < 2.2e-16 ***
factor(praxisid)0305 -7.298794 0.338442 -21.5658 < 2.2e-16 ***
factor(praxisid)0306 -5.111923 0.370645 -13.7920 < 2.2e-16 ***
factor(praxisid)0307 -3.069838 0.544317 -5.6398 2.687e-08 ***
factor(praxisid)0308 -4.421619 0.255364 -17.3150 < 2.2e-16 ***
factor(praxisid)0309 -2.888557 0.052438 -55.0847 < 2.2e-16 ***
factor(praxisid)0310 4.273895 0.328399 13.0143 < 2.2e-16 ***
factor(praxisid)0311 -0.146787 0.206286 -0.7116 0.4770233
factor(praxisid)0312 4.698312 0.434591 10.8109 < 2.2e-16 ***
factor(praxisid)0313 -5.562244 0.575730 -9.6612 < 2.2e-16 ***
factor(praxisid)0314 1.674841 0.350199 4.7825 2.209e-06 ***
factor(praxisid)0315 -4.792902 0.088942 -53.8881 < 2.2e-16 ***
factor(praxisid)0316 0.792484 0.313411 2.5286 0.0117223 *
factor(praxisid)0317 -4.613136 0.531692 -8.6763 < 2.2e-16 ***
factor(praxisid)0318 3.177469 0.197673 16.0743 < 2.2e-16 ***
factor(praxisid)0319 2.173599 0.130523 16.6530 < 2.2e-16 ***
factor(praxisid)0320 -2.713980 0.183824 -14.7640 < 2.2e-16 ***
factor(praxisid)0321 -0.302488 0.304034 -0.9949 0.3201999
factor(praxisid)0322 -11.434293 0.619287 -18.4636 < 2.2e-16 ***
factor(praxisid)0323 -4.595552 0.568874 -8.0783 3.957e-15 ***
factor(praxisid)0324 -1.941716 0.243611 -7.9706 8.705e-15 ***
factor(praxisid)0325 -2.976395 0.347162 -8.5735 < 2.2e-16 ***
factor(praxisid)0326 -5.102085 0.091025 -56.0512 < 2.2e-16 ***
factor(praxisid)0327 -1.037689 0.512567 -2.0245 0.0433865 *
factor(praxisid)0328 -2.476252 0.330883 -7.4838 2.766e-13 ***
factor(praxisid)0329 1.803152 0.252483 7.1417 2.835e-12 ***
factor(praxisid)0330 -0.061021 0.097973 -0.6228 0.5336443
factor(praxisid)0403 -7.604338 0.268695 -28.3010 < 2.2e-16 ***
factor(praxisid)0404 2.055911 0.754197 2.7260 0.0066094 **
factor(praxisid)0405 -10.506117 0.650119 -16.1603 < 2.2e-16 ***
factor(praxisid)0407 -8.029195 0.581834 -13.7998 < 2.2e-16 ***
factor(praxisid)0408 -8.220836 0.497019 -16.5403 < 2.2e-16 ***
factor(praxisid)0409 -0.680401 0.279471 -2.4346 0.0152151 *
factor(praxisid)0411 -6.212042 0.517903 -11.9946 < 2.2e-16 ***
factor(praxisid)0412 -0.512750 0.086804 -5.9070 6.005e-09 ***
factor(praxisid)0413 -4.265538 0.299396 -14.2471 < 2.2e-16 ***
factor(praxisid)0414 -6.603663 0.644509 -10.2460 < 2.2e-16 ***
factor(praxisid)0415 -4.496306 0.655807 -6.8561 1.850e-11 ***
factor(praxisid)0502 -4.447811 0.462831 -9.6100 < 2.2e-16 ***
factor(praxisid)0504 0.695986 0.384523 1.8100 0.0708242 .
factor(praxisid)0507 -2.071683 0.249290 -8.3103 7.054e-16 ***
factor(praxisid)0508 -1.583406 0.474791 -3.3350 0.0009088 ***
factor(praxisid)0510 -1.961047 0.346387 -5.6614 2.385e-08 ***
factor(praxisid)0511 -1.625999 0.432561 -3.7590 0.0001882 ***
factor(praxisid)0513 -0.778327 0.684504 -1.1371 0.2559895
factor(praxisid)0514 -2.235513 0.431500 -5.1808 3.074e-07 ***
factor(praxisid)0516 -2.226234 0.512210 -4.3463 1.640e-05 ***
factor(praxisid)0519 -2.451443 0.309377 -7.9238 1.222e-14 ***
factor(praxisid)0520 -4.875180 0.702851 -6.9363 1.100e-11 ***
factor(praxisid)0522 2.433557 0.547511 4.4448 1.058e-05 ***
factor(praxisid)0524 -3.514030 0.400862 -8.7662 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`
`
When colleagues tried to replicate my code, they did not include praxisid as factor into the model, leading to the main effect of group to disappear.
`
`> ancova_model_base = lm(gewicht_fu2 ~ gruppe + gewicht + alter_unt + geschlecht, data=data)
> clustered_se_base <- vcovCL(ancova_model_base, cluster = ~ praxisid, type = "HC1")
> results_base <- coeftest(ancova_model_base, vcov = clustered_se_base)
> print(results_base)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.749459 6.030093 3.7727 0.0001756 ***
gruppe2 -0.186707 0.520137 -0.3590 0.7197386
gewicht 0.927351 0.018562 49.9591 < 2.2e-16 ***
alter_unt -0.227799 0.065403 -3.4830 0.0005275 ***
geschlecht -0.932407 0.543146 -1.7167 0.0864922 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`
`
I reran all models including factor(praxisid)and excluding it. All including models show a significant group effect while all those excluding it did not. This made me question my initial results, as I seem to be eliminating cluster differences by controlling for them in the model so that only within-cluster variation remains. There are many clusters, but they mostly contain very few observations. The implications are rather big as the entire intervention effects disappears. My initial line of thinking when includign the factor was to account for systemic differences between doctors offices that would be eliminated using one fixed effect per doctors office. participants werent randomized within doctors offices but doctors offices were assigned randomized to intervention vs control group. In my mind, this must mean that doctors office is a fixed effect (that is possibly confounded with group?). A fixed effect should model these explicitely rather than just having them reflected in clustered standard errors.
Which approach is the correct one here? Including the cluster variable as fixed effect in the model or having the cluster variable only in the estimation of standard errors?