Suspicious values for a GLMM model

nlarigaldie · February 12, 2024, 2:02am

Hi everyone,

As the title says, I have a problem with a GLMM model using lme4 which outputs some extremely suspicious values. This is my first time using glmer, as I only used lmer in the past (with no problem). I have some dataset within which my predicted value is positive and rightly-skewed, and I wanted to try fitting a model using a Gamma distribution with an Identity link. I grew immediately suspicious when, on my real dataset that is only a pilot before the real data collection, I saw that p-values are <2e-16 and the conditional r² from the Performance package is 1 (!). This is supposed to be very noisy data and I have a very simple model.

I have a reprex to share (small subset of the data, so obviously those are not my actual outputs), on which I compare a "regular" fit with a normal distribution, which has quite similar coefficient estimates, but very different results in terms of standard error (and so, p-values):

library(tidyverse)
library(lme4)
#> Loading required package: Matrix
#> 
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#> 
#>     expand, pack, unpack
library(performance)

data <- tibble::tribble(
                                                           ~A,      ~R,        ~PID,
                                                        11254,       4,     "55945",
                                                         2666,       6,     "55372",
                                                         3646,       5,     "47725",
                                                        11390,       6,     "46837",
                                                         4204,       2,     "55372",
                                                         4474,       7,     "55453",
                                                        20266,       7,     "55600",
                                                         2239,       5,     "55600",
                                                         3068,       2,     "55945",
                                                         8429,       5,     "55372",
                                                        26603,       4,     "55600",
                                                        15346,       4,     "55534",
                                                        20866,       3,     "47725",
                                                         6143,       3,     "55945",
                                                        12870,       6,     "47998",
                                                         3127,       6,     "46837",
                                                         4426,       5,     "47725",
                                                        11388,       5,     "46837",
                                                        14675,       4,     "55945",
                                                         4262,       4,     "47998",
                                                        23464,       7,     "46837",
                                                         2184,       2,     "47725",
                                                         3295,       3,     "55945",
                                                         1794,       3,     "55945",
                                                         6292,       2,     "55372",
                                                        18071,       6,     "55600",
                                                        13374,       5,     "55945",
                                                         2809,       7,     "55453",
                                                         3381,       7,     "55453",
                                                        10847,       2,     "55945",
                                                         6332,       7,     "46837",
                                                        30487,       6,     "55600",
                                                         9745,       5,     "55453",
                                                         3993,       3,     "55372",
                                                         4906,       7,     "46837",
                                                        33046,       5,     "55945",
                                                         2115,       4,     "55600",
                                                         1889,       7,     "47725",
                                                         5785,       6,     "55453",
                                                         4139,       7,     "46837",
                                                         8360,       2,     "55372",
                                                         4644,       5,     "46837",
                                                         7727,       5,     "55372",
                                                         2484,       4,     "55945",
                                                         1987,       6,     "47998",
                                                         5154,       7,     "55600",
                                                         4342,       7,     "46837",
                                                        15274,       6,     "55534",
                                                         2000,       1,     "47998",
                                                         7160,       5,     "46837"
                                                   )

fit <- glmer(A ~ R + (1 |PID),
             data=data, family=Gamma(link = identity))

fit_normal <- lmer(A ~ R + (1 |PID),
                   data=data)

summary(fit)
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: Gamma  ( identity )
#> Formula: A ~ R + (1 | PID)
#>    Data: data
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>   1009.8   1017.4   -500.9   1001.8       46 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -0.9758 -0.7193 -0.3880  0.5272  3.1221 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev. 
#>  PID      (Intercept) 3.32e+06 1822.1715
#>  Residual             6.76e-01    0.8222
#> Number of obs: 50, groups:  PID, 8
#> 
#> Fixed effects:
#>             Estimate Std. Error t value Pr(>|z|)    
#> (Intercept)   5606.4      183.0  30.643  < 2e-16 ***
#> R              595.0      141.7   4.198 2.69e-05 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>   (Intr)
#> R -0.145
summary(fit_normal)
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: A ~ R + (1 | PID)
#>    Data: data
#> 
#> REML criterion at convergence: 1004.8
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.1413 -0.6782 -0.3590  0.4081  3.0941 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  PID      (Intercept)  4357653 2087    
#>  Residual             57017892 7551    
#> Number of obs: 50, groups:  PID, 8
#> 
#> Fixed effects:
#>             Estimate Std. Error t value
#> (Intercept)   6075.5     3546.2   1.713
#> R              547.5      678.8   0.807
#> 
#> Correlation of Fixed Effects:
#>   (Intr)
#> R -0.928
r2(fit)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 1.000
#>      Marginal R2: 0.244
r2(fit_normal)
#> # R2 for Mixed Models
#> 
#>   Conditional R2: 0.085
#>      Marginal R2: 0.015

I think the r2 package is using the wrong type of residuals (as resid(fit) by default returns some extremely small residuals while it returns the right ones for resid(fit_normal)), which explains the weird r².

However even if this is why the r2 is wrong, the problem of very small standard errors of fixed effects coefficients and incredibly low p-values stays (the reprex does not adequately capture this, but in my original dataset, even though coefficient estimates are very similar in both models, the standard error is 20 times lower for the coefficient of the R variable, and 110 times lower for the intercept). I have another similar dataset with a much more complex model, and all fixed effects are also p<2e-16 when using glmer and the Gamma distribution.

So all in all, I am quite unsure if it's me who misunderstood something fundamental in the way glmer works, if I did a mistake somewhere in my code, or if there is a bug in lme4.

Any help would be appreciated, thank you very much!

nirgrahamuk · February 12, 2024, 10:09am

I don't understand why you would be surprised at high performance, when you use PID to in essence fit multiple models, one for each data point. I expect that you simply shouldn't be doing this. There is no noise when each measurement gets its own model, but probably no scientific value either.

There is almost no context to what you are seeking assistance with, which limits how well one can advise.

nlarigaldie · February 12, 2024, 12:15pm

Thanks for your reply!

Apologies, I realize now that my reprex was not very well chosen: I took a random splice of my real data to avoid giving a very long code, but in the real data I have of course many datapoints per PID (~240), and only 8 PIDs. So I am not trying to fit one model for each datapoint. If you want, I can provide you with a better-chosen splice, and I'm sure the results will be the same.

What surprises me is that on my real data, the sum of squares of errors from the glmer and the lmer models (calculated "by hand" using predict()) are very close to one another, yet the conditional r² given by the Performance package is 1 for the glmer model, and nowhere close to that on the lmer one. So basically, errors made by each model are the same, but in one case the fit is reported to be nearly perfect, and in the other it's not. If some kind of overfitting was at play here because of the low number of datapoints per PID, considering that the gamma and normal distributed models are otherwise exactly the same, I do not see why this would be a huge problem in the glmer model but not in the lmer one, even in the reprex. If that was the problem, surely I would expect the conditional r² to be 1 for the lmer model too, right?

nirgrahamuk · February 12, 2024, 12:54pm

ok, this may be a bug in performance rather than lme4.
I think it boils down to the defaults in how residuals are reported with lmer its 'response' and glmer its 'deviance'

library(lme4)
library(performance)
(fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy))



(fm2 <- glmer(Reaction ~ Days + (Days | Subject), sleepstudy,
              family = Gamma(link="identity")))

performance(fm1)
performance(fm2)

head(residuals(fm1))
head(residuals(fm2)) # defaults to a deviance response 
head(residuals(fm2,type="response")) # we can get the response but how to tell performance this ?

I believe the issues tracker for performance is at

nlarigaldie · February 12, 2024, 3:15pm

Indeed! I also suspected the same thing, so thanks a lot for the confirmation. I since had a look at the code from Performance, and apparently it's in their ToDo list, so they're aware of the potential problem. I'll post something on the issues anyway, just in case.

I think my trust issues in the coefficients themselves was in major part driven by this unexpected result from r², so I'll mark the thread as solved.

Thanks again for your help!

nirgrahamuk · February 12, 2024, 3:22pm

You're welcome, thanks for the interesting question

system · February 19, 2024, 3:23pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.