Hi everyone,
As the title says, I have a problem with a GLMM model using lme4 which outputs some extremely suspicious values. This is my first time using glmer, as I only used lmer in the past (with no problem). I have some dataset within which my predicted value is positive and rightly-skewed, and I wanted to try fitting a model using a Gamma distribution with an Identity link. I grew immediately suspicious when, on my real dataset that is only a pilot before the real data collection, I saw that p-values are <2e-16 and the conditional r² from the Performance package is 1 (!). This is supposed to be very noisy data and I have a very simple model.
I have a reprex to share (small subset of the data, so obviously those are not my actual outputs), on which I compare a "regular" fit with a normal distribution, which has quite similar coefficient estimates, but very different results in terms of standard error (and so, p-values):
library(tidyverse)
library(lme4)
#> Loading required package: Matrix
#>
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#>
#> expand, pack, unpack
library(performance)
data <- tibble::tribble(
~A, ~R, ~PID,
11254, 4, "55945",
2666, 6, "55372",
3646, 5, "47725",
11390, 6, "46837",
4204, 2, "55372",
4474, 7, "55453",
20266, 7, "55600",
2239, 5, "55600",
3068, 2, "55945",
8429, 5, "55372",
26603, 4, "55600",
15346, 4, "55534",
20866, 3, "47725",
6143, 3, "55945",
12870, 6, "47998",
3127, 6, "46837",
4426, 5, "47725",
11388, 5, "46837",
14675, 4, "55945",
4262, 4, "47998",
23464, 7, "46837",
2184, 2, "47725",
3295, 3, "55945",
1794, 3, "55945",
6292, 2, "55372",
18071, 6, "55600",
13374, 5, "55945",
2809, 7, "55453",
3381, 7, "55453",
10847, 2, "55945",
6332, 7, "46837",
30487, 6, "55600",
9745, 5, "55453",
3993, 3, "55372",
4906, 7, "46837",
33046, 5, "55945",
2115, 4, "55600",
1889, 7, "47725",
5785, 6, "55453",
4139, 7, "46837",
8360, 2, "55372",
4644, 5, "46837",
7727, 5, "55372",
2484, 4, "55945",
1987, 6, "47998",
5154, 7, "55600",
4342, 7, "46837",
15274, 6, "55534",
2000, 1, "47998",
7160, 5, "46837"
)
fit <- glmer(A ~ R + (1 |PID),
data=data, family=Gamma(link = identity))
fit_normal <- lmer(A ~ R + (1 |PID),
data=data)
summary(fit)
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#> Approximation) [glmerMod]
#> Family: Gamma ( identity )
#> Formula: A ~ R + (1 | PID)
#> Data: data
#>
#> AIC BIC logLik deviance df.resid
#> 1009.8 1017.4 -500.9 1001.8 46
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -0.9758 -0.7193 -0.3880 0.5272 3.1221
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> PID (Intercept) 3.32e+06 1822.1715
#> Residual 6.76e-01 0.8222
#> Number of obs: 50, groups: PID, 8
#>
#> Fixed effects:
#> Estimate Std. Error t value Pr(>|z|)
#> (Intercept) 5606.4 183.0 30.643 < 2e-16 ***
#> R 595.0 141.7 4.198 2.69e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> R -0.145
summary(fit_normal)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: A ~ R + (1 | PID)
#> Data: data
#>
#> REML criterion at convergence: 1004.8
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -1.1413 -0.6782 -0.3590 0.4081 3.0941
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> PID (Intercept) 4357653 2087
#> Residual 57017892 7551
#> Number of obs: 50, groups: PID, 8
#>
#> Fixed effects:
#> Estimate Std. Error t value
#> (Intercept) 6075.5 3546.2 1.713
#> R 547.5 678.8 0.807
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> R -0.928
r2(fit)
#> # R2 for Mixed Models
#>
#> Conditional R2: 1.000
#> Marginal R2: 0.244
r2(fit_normal)
#> # R2 for Mixed Models
#>
#> Conditional R2: 0.085
#> Marginal R2: 0.015
I think the r2 package is using the wrong type of residuals (as resid(fit) by default returns some extremely small residuals while it returns the right ones for resid(fit_normal)), which explains the weird r².
However even if this is why the r2 is wrong, the problem of very small standard errors of fixed effects coefficients and incredibly low p-values stays (the reprex does not adequately capture this, but in my original dataset, even though coefficient estimates are very similar in both models, the standard error is 20 times lower for the coefficient of the R variable, and 110 times lower for the intercept). I have another similar dataset with a much more complex model, and all fixed effects are also p<2e-16 when using glmer and the Gamma distribution.
So all in all, I am quite unsure if it's me who misunderstood something fundamental in the way glmer works, if I did a mistake somewhere in my code, or if there is a bug in lme4.
Any help would be appreciated, thank you very much!