Including cluster variable in ancova using cluster robust standard errors?

adinan · June 15, 2025, 10:15pm

I need some help estimating a model using cluster robust standard errors with lmtest and sandwich package. I'm conducting a longitudinal ancova using weight as dependent variable. It's a public health study where participants were recruited in different doctors offices (variable praxisid) leading to a clustered data structure. My initial model accounted for clustered data by including praxisid as a factor to control for differences between different clusters/doctors offices, then using robust standard errors (vcovCL argument cluster = ~ praxisid) for proper estimation of the standard erros. This is the initial model (gewicht_fu2 = weight at follow up, gruppe = group, gewicht = weight, alter_unt = age at measurement, geschlecht = se x, praxisid = ID of doctors office).``

` ` ancova_model_base = lm(gewicht_fu2 ~ gruppe + gewicht + alter_unt + geschlecht + factor(praxisid), data=data)
>   clustered_se_base <- vcovCL(ancova_model_base, cluster = ~ praxisid, type = "HC1")
>   results_base <- coeftest(ancova_model_base, vcov = clustered_se_base)
>   print(results_base)

t test of coefficients:

                       Estimate Std. Error  t value  Pr(>|t|)    
(Intercept)           24.128480   7.122828   3.3875 0.0007542 ***
gruppe2                3.931860   0.359936  10.9238 < 2.2e-16 ***
gewicht                0.921221   0.023070  39.9319 < 2.2e-16 ***
alter_unt             -0.227253   0.079054  -2.8747 0.0041961 ** 
geschlecht            -1.164494   0.665107  -1.7508 0.0805138 .  
factor(praxisid)0102  -5.559858   0.530346 -10.4834 < 2.2e-16 ***
factor(praxisid)0103  -4.135628   0.413729  -9.9960 < 2.2e-16 ***
factor(praxisid)0104  -4.728151   0.542333  -8.7182 < 2.2e-16 ***
factor(praxisid)0105   2.299247   0.141410  16.2594 < 2.2e-16 ***
factor(praxisid)0106  -0.342249   0.174957  -1.9562 0.0509321 .  
factor(praxisid)0107   0.862412   0.105880   8.1452 2.417e-15 ***
factor(praxisid)0108  -3.177997   0.417356  -7.6146 1.110e-13 ***
factor(praxisid)0109  -3.980172   0.451470  -8.8160 < 2.2e-16 ***
factor(praxisid)0110  -6.397870   0.546846 -11.6996 < 2.2e-16 ***
factor(praxisid)0111  -0.887394   0.250536  -3.5420 0.0004298 ***
factor(praxisid)0112   1.317657   0.052704  25.0013 < 2.2e-16 ***
factor(praxisid)0113  -1.247635   0.165212  -7.5517 1.724e-13 ***
factor(praxisid)0114  -4.253447   0.257481 -16.5195 < 2.2e-16 ***
factor(praxisid)0115  -1.335301   0.202419  -6.5967 9.638e-11 ***
factor(praxisid)0116  -4.269958   0.330097 -12.9355 < 2.2e-16 ***
factor(praxisid)0117  -4.789319   0.576171  -8.3123 6.949e-16 ***
factor(praxisid)0118   2.285457   0.720741   3.1710 0.0016012 ** 
factor(praxisid)0119   3.369629   0.453080   7.4372 3.817e-13 ***
factor(praxisid)0120  -9.393821   0.331001 -28.3800 < 2.2e-16 ***
factor(praxisid)0121  -7.729174   0.212690 -36.3401 < 2.2e-16 ***
factor(praxisid)0123   2.501104   0.723865   3.4552 0.0005909 ***
factor(praxisid)0124  -5.228440   0.646971  -8.0814 3.868e-15 ***
factor(praxisid)0125   0.662840   0.685234   0.9673 0.3337961    
factor(praxisid)0126  -5.809083   1.147089  -5.0642 5.552e-07 ***
factor(praxisid)0128  -8.601282   0.255758 -33.6305 < 2.2e-16 ***
factor(praxisid)0129  -3.823872   0.627748  -6.0914 2.064e-09 ***
factor(praxisid)0130   0.874424   0.286838   3.0485 0.0024069 ** 
factor(praxisid)0131  -2.550574   0.239055 -10.6694 < 2.2e-16 ***
factor(praxisid)0132  -5.731985   0.301485 -19.0125 < 2.2e-16 ***
factor(praxisid)0133  -7.729465   0.645706 -11.9706 < 2.2e-16 ***
factor(praxisid)0134  -4.198254   0.478053  -8.7820 < 2.2e-16 ***
factor(praxisid)0136   7.605662   0.431907  17.6095 < 2.2e-16 ***
factor(praxisid)0137  -3.344320   0.634414  -5.2715 1.925e-07 ***
factor(praxisid)0138  -1.925985   0.430881  -4.4699 9.451e-06 ***
factor(praxisid)0140  -5.917373   0.140893 -41.9990 < 2.2e-16 ***
factor(praxisid)0141  -5.045173   0.461395 -10.9346 < 2.2e-16 ***
factor(praxisid)0201  -3.445644   1.035772  -3.3266 0.0009358 ***
factor(praxisid)0202  -3.045525   0.407840  -7.4675 3.096e-13 ***
factor(praxisid)0203   1.130478   0.247791   4.5622 6.206e-06 ***
factor(praxisid)0204   2.145188   0.188710  11.3676 < 2.2e-16 ***
factor(praxisid)0205  -7.457343   0.506050 -14.7364 < 2.2e-16 ***
factor(praxisid)0206   1.594615   0.908716   1.7548 0.0798325 .  
factor(praxisid)0207  -0.393628   0.191666  -2.0537 0.0404602 *  
factor(praxisid)0208  -0.727459   0.249509  -2.9156 0.0036906 ** 
factor(praxisid)0209  -3.526821   0.630328  -5.5952 3.430e-08 ***
factor(praxisid)0210  -3.441156   0.363318  -9.4715 < 2.2e-16 ***
factor(praxisid)0211  -3.915442   0.465830  -8.4053 3.445e-16 ***
factor(praxisid)0212  -4.617452   0.176922 -26.0988 < 2.2e-16 ***
factor(praxisid)0213   1.557216   0.229727   6.7785 3.048e-11 ***
factor(praxisid)0214   0.158470   0.408048   0.3884 0.6978946    
factor(praxisid)0215  -1.022694   0.076523 -13.3646 < 2.2e-16 ***
factor(praxisid)0216  -6.315419   0.729673  -8.6551 < 2.2e-16 ***
factor(praxisid)0218 -11.536564   0.640052 -18.0244 < 2.2e-16 ***
factor(praxisid)0219  -6.950880   0.760465  -9.1403 < 2.2e-16 ***
factor(praxisid)0221  -4.976299   0.437071 -11.3856 < 2.2e-16 ***
factor(praxisid)0222  -6.312480   0.711347  -8.8740 < 2.2e-16 ***
factor(praxisid)0224  -3.865069   0.364790 -10.5953 < 2.2e-16 ***
factor(praxisid)0225  -4.324352   0.715237  -6.0460 2.691e-09 ***
factor(praxisid)0227   8.912021   0.275690  32.3263 < 2.2e-16 ***
factor(praxisid)0229  -6.029149   0.249882 -24.1280 < 2.2e-16 ***
factor(praxisid)0230  -2.141115   0.521678  -4.1043 4.650e-05 ***
factor(praxisid)0302   0.208030   0.101482   2.0499 0.0408297 *  
factor(praxisid)0303  -2.306601   0.125588 -18.3664 < 2.2e-16 ***
factor(praxisid)0304  -3.298788   0.311771 -10.5808 < 2.2e-16 ***
factor(praxisid)0305  -7.298794   0.338442 -21.5658 < 2.2e-16 ***
factor(praxisid)0306  -5.111923   0.370645 -13.7920 < 2.2e-16 ***
factor(praxisid)0307  -3.069838   0.544317  -5.6398 2.687e-08 ***
factor(praxisid)0308  -4.421619   0.255364 -17.3150 < 2.2e-16 ***
factor(praxisid)0309  -2.888557   0.052438 -55.0847 < 2.2e-16 ***
factor(praxisid)0310   4.273895   0.328399  13.0143 < 2.2e-16 ***
factor(praxisid)0311  -0.146787   0.206286  -0.7116 0.4770233    
factor(praxisid)0312   4.698312   0.434591  10.8109 < 2.2e-16 ***
factor(praxisid)0313  -5.562244   0.575730  -9.6612 < 2.2e-16 ***
factor(praxisid)0314   1.674841   0.350199   4.7825 2.209e-06 ***
factor(praxisid)0315  -4.792902   0.088942 -53.8881 < 2.2e-16 ***
factor(praxisid)0316   0.792484   0.313411   2.5286 0.0117223 *  
factor(praxisid)0317  -4.613136   0.531692  -8.6763 < 2.2e-16 ***
factor(praxisid)0318   3.177469   0.197673  16.0743 < 2.2e-16 ***
factor(praxisid)0319   2.173599   0.130523  16.6530 < 2.2e-16 ***
factor(praxisid)0320  -2.713980   0.183824 -14.7640 < 2.2e-16 ***
factor(praxisid)0321  -0.302488   0.304034  -0.9949 0.3201999    
factor(praxisid)0322 -11.434293   0.619287 -18.4636 < 2.2e-16 ***
factor(praxisid)0323  -4.595552   0.568874  -8.0783 3.957e-15 ***
factor(praxisid)0324  -1.941716   0.243611  -7.9706 8.705e-15 ***
factor(praxisid)0325  -2.976395   0.347162  -8.5735 < 2.2e-16 ***
factor(praxisid)0326  -5.102085   0.091025 -56.0512 < 2.2e-16 ***
factor(praxisid)0327  -1.037689   0.512567  -2.0245 0.0433865 *  
factor(praxisid)0328  -2.476252   0.330883  -7.4838 2.766e-13 ***
factor(praxisid)0329   1.803152   0.252483   7.1417 2.835e-12 ***
factor(praxisid)0330  -0.061021   0.097973  -0.6228 0.5336443    
factor(praxisid)0403  -7.604338   0.268695 -28.3010 < 2.2e-16 ***
factor(praxisid)0404   2.055911   0.754197   2.7260 0.0066094 ** 
factor(praxisid)0405 -10.506117   0.650119 -16.1603 < 2.2e-16 ***
factor(praxisid)0407  -8.029195   0.581834 -13.7998 < 2.2e-16 ***
factor(praxisid)0408  -8.220836   0.497019 -16.5403 < 2.2e-16 ***
factor(praxisid)0409  -0.680401   0.279471  -2.4346 0.0152151 *  
factor(praxisid)0411  -6.212042   0.517903 -11.9946 < 2.2e-16 ***
factor(praxisid)0412  -0.512750   0.086804  -5.9070 6.005e-09 ***
factor(praxisid)0413  -4.265538   0.299396 -14.2471 < 2.2e-16 ***
factor(praxisid)0414  -6.603663   0.644509 -10.2460 < 2.2e-16 ***
factor(praxisid)0415  -4.496306   0.655807  -6.8561 1.850e-11 ***
factor(praxisid)0502  -4.447811   0.462831  -9.6100 < 2.2e-16 ***
factor(praxisid)0504   0.695986   0.384523   1.8100 0.0708242 .  
factor(praxisid)0507  -2.071683   0.249290  -8.3103 7.054e-16 ***
factor(praxisid)0508  -1.583406   0.474791  -3.3350 0.0009088 ***
factor(praxisid)0510  -1.961047   0.346387  -5.6614 2.385e-08 ***
factor(praxisid)0511  -1.625999   0.432561  -3.7590 0.0001882 ***
factor(praxisid)0513  -0.778327   0.684504  -1.1371 0.2559895    
factor(praxisid)0514  -2.235513   0.431500  -5.1808 3.074e-07 ***
factor(praxisid)0516  -2.226234   0.512210  -4.3463 1.640e-05 ***
factor(praxisid)0519  -2.451443   0.309377  -7.9238 1.222e-14 ***
factor(praxisid)0520  -4.875180   0.702851  -6.9363 1.100e-11 ***
factor(praxisid)0522   2.433557   0.547511   4.4448 1.058e-05 ***
factor(praxisid)0524  -3.514030   0.400862  -8.7662 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

`

When colleagues tried to replicate my code, they did not include praxisid as factor into the model, leading to the main effect of group to disappear.

`

`>   ancova_model_base = lm(gewicht_fu2 ~ gruppe + gewicht + alter_unt + geschlecht, data=data)
>   clustered_se_base <- vcovCL(ancova_model_base, cluster = ~ praxisid, type = "HC1")
>   results_base <- coeftest(ancova_model_base, vcov = clustered_se_base)
>   print(results_base)

t test of coefficients:

             Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 22.749459   6.030093  3.7727 0.0001756 ***
gruppe2     -0.186707   0.520137 -0.3590 0.7197386    
gewicht      0.927351   0.018562 49.9591 < 2.2e-16 ***
alter_unt   -0.227799   0.065403 -3.4830 0.0005275 ***
geschlecht  -0.932407   0.543146 -1.7167 0.0864922 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1`

`

I reran all models including factor(praxisid)and excluding it. All including models show a significant group effect while all those excluding it did not. This made me question my initial results, as I seem to be eliminating cluster differences by controlling for them in the model so that only within-cluster variation remains. There are many clusters, but they mostly contain very few observations. The implications are rather big as the entire intervention effects disappears. My initial line of thinking when includign the factor was to account for systemic differences between doctors offices that would be eliminated using one fixed effect per doctors office. participants werent randomized within doctors offices but doctors offices were assigned randomized to intervention vs control group. In my mind, this must mean that doctors office is a fixed effect (that is possibly confounded with group?). A fixed effect should model these explicitely rather than just having them reflected in clustered standard errors.

Which approach is the correct one here? Including the cluster variable as fixed effect in the model or having the cluster variable only in the estimation of standard errors?

GeraldineK · June 16, 2025, 1:03am

I think that the first model is more robust as it seems to take unobserved variation/heterogeneity between doctor's practices into account specifically. But it also depends on how you think the model relates to the real world? Based on your expertise, do these unobserved effects between doctor's practices matter? If so, the first model fits that thought.
I would also make sure to have a look at residuals, fitted values, etc. Which model predicts the response variable better? Are residuals normally distributed for both? Significance is only one component of statistical models.
Note that I am not saying that the second approach is "wrong": with statistical models it's important to think through how the model relates to your conception of the real world.

To be honest though, I have not used an approach such as this - I would have gone with a mixed effects model, with praxisid as random effect, especially if there are few observations for each praxisid. But of course this has a slightly different conceptual meaning: for a random effect we would assume the random intercepts are drawn from a similar (normal) distribution, and it would indicate we are not interested in the effect of the specific doctor's practice.