Tidymodels' "numeric" predict has different results compared to LME4's "response" predict for models with weights.

kydare · October 4, 2022, 11:20pm

Good day,

I'd like to ask about the difference in predictions delivered by type="numeric" compared to type="response" when working with weighted glmer models.
Here's a minimal working example to reproduce the issue:

library(lme4)
library(tidymodels)
library(multilevelmod)

data("mtcars")
set.seed(42)
car_split <- mtcars %>%
  mutate(RE = c(rep("A", 10), rep("B", 10), rep("C", 12)),
         weights = sample.int(10, 10, size=32)) %>%
  mutate(y = runif(nrow(.))) %>%
  mutate(weights_freq = frequency_weights(weights)) %>%
  initial_split()

train_data <- training(car_split)
test_data <- testing(car_split)


# GLMER Models with weights:
## in lmer4
mod_lme4_w <-  glmer(y ~ wt + hp + (1 | RE),
  data = train_data,
  family = binomial(link = logit),
  nAGQ = 0,
  weights = weights
)
## in tidymodels
mod_tm_w <- linear_reg() %>%
  set_engine("glmer",
    family = stats::binomial(link = "logit"),
    nAGQ = 0,
  ) %>%
  translate()
workflow_tm_w <- workflow() %>%
  add_variables(outcomes = y, predictors = c(wt, hp, RE)) %>%
  add_model(mod_tm_w, formula = y ~ wt + hp + (1 | RE)) %>%
  add_case_weights(weights_freq) %>%
  fit(train_data)

## Predictions
pred_lme4_w <- predict(mod_lme4_w, type="response", test_data)
pred_lme4_raw_w <- predict(mod_lme4_w, test_data)
pred_tm_w <- predict(workflow_tm_w, test_data)
pred_tm_raw_w <- predict(workflow_tm_w, type="raw", test_data)

## Compare predictions
identical(unname(pred_lme4_w), pred_tm_w$.pred) # FALSE
identical(unname(pred_lme4_raw_w), unname(pred_tm_raw_w)) # TRUE

As you can see, when I use type="raw", the predictions are exactly the same as the ones returned by regular predict(). However, if I use type="numeric" and type="response", there is quite a significant difference in predictions. It does not happen with every data partitioning (depending on the seed, they can also be the same), and it is not an issue with regular GLM models or GLMER models without weights.

As such, I'd like to ask what may have caused this, how can I mitigate it, and whether I can pass a parameter, so fit_resamples uses "raw" values to calculate metrics.

With respect,
Kydare.

hannah · October 13, 2022, 12:57pm

For new samples which were not part of the training set, prediction of type "numeric" is the population estimate without the random effects. More details are in the engine documentation.

system · November 3, 2022, 12:57pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.