I am trying to use the tidymodels package to build a linear mixed model. It looks like I'm specifying the formula in the correct way, as I can run a fit() on the workflow.
However, when I try to run it on resamples of my data, using the function fit_resamples(), I get an error relative to missing factor levels.
It is not clear to me if I'm doing something wrong, or if the packages "multilevelmod" and "tune" are not compatible in this way, and so I would greatly appreciate any advice.
I have included a reprex using the "mpg" dataset.
packages = c(
"tidyverse",
"tidymodels",
"multilevelmod"
)
lapply(packages, library, character.only = TRUE)
#> Registered S3 method overwritten by 'tune':
#> method from
#> required_pkgs.model_spec parsnip
#> Warning: package 'parsnip' was built under R version 4.1.3
#> Warning: package 'multilevelmod' was built under R version 4.1.3
#> [[1]]
#> [1] "forcats" "stringr" "dplyr" "purrr" "readr" "tidyr"
#> [7] "tibble" "ggplot2" "tidyverse" "stats" "graphics" "grDevices"
#> [13] "utils" "datasets" "methods" "base"
#>
#> [[2]]
#> [1] "yardstick" "workflowsets" "workflows" "tune" "rsample"
#> [6] "recipes" "parsnip" "modeldata" "infer" "dials"
#> [11] "scales" "broom" "tidymodels" "forcats" "stringr"
#> [16] "dplyr" "purrr" "readr" "tidyr" "tibble"
#> [21] "ggplot2" "tidyverse" "stats" "graphics" "grDevices"
#> [26] "utils" "datasets" "methods" "base"
#>
#> [[3]]
#> [1] "multilevelmod" "yardstick" "workflowsets" "workflows"
#> [5] "tune" "rsample" "recipes" "parsnip"
#> [9] "modeldata" "infer" "dials" "scales"
#> [13] "broom" "tidymodels" "forcats" "stringr"
#> [17] "dplyr" "purrr" "readr" "tidyr"
#> [21] "tibble" "ggplot2" "tidyverse" "stats"
#> [25] "graphics" "grDevices" "utils" "datasets"
#> [29] "methods" "base"
data(mpg, package = "ggplot2")
training = mpg %>%
initial_split() %>%
training()
training_folds = training %>%
validation_split()
lmm_model = linear_reg() %>%
set_engine("lmer")
lmm_workflow = workflow() %>%
add_variables(outcomes = cty,
predictors = c(year, manufacturer, model)) %>%
add_model(lmm_model, formula = cty ~ year + (1|manufacturer/model))
# A simple fit works
fit(lmm_workflow, training)
#> == Workflow [trained] ==========================================================
#> Preprocessor: Variables
#> Model: linear_reg()
#>
#> -- Preprocessor ----------------------------------------------------------------
#> Outcomes: cty
#> Predictors: c(year, manufacturer, model)
#>
#> -- Model -----------------------------------------------------------------------
#> Linear mixed model fit by REML ['lmerMod']
#> Formula: cty ~ year + (1 | manufacturer/model)
#> Data: data
#> REML criterion at convergence: 831.6647
#> Random effects:
#> Groups Name Std.Dev.
#> model:manufacturer (Intercept) 2.913
#> manufacturer (Intercept) 2.328
#> Residual 1.936
#> Number of obs: 175, groups: model:manufacturer, 38; manufacturer, 15
#> Fixed Effects:
#> (Intercept) year
#> -64.37893 0.04037
# A fit with resamplings doesn't work:
fit_resamples(lmm_workflow, resamples = training_folds)
#> x validation: preprocessor 1/1, model 1/1 (predictions): Error: Assigned data `factor(...
#> Warning: All models failed. See the `.notes` column.
#> Warning: This tuning result has notes. Example notes on model fitting include:
#> preprocessor 1/1, model 1/1 (predictions): Error: Assigned data `factor(lvl[1], levels = lvl)` must be compatible with existing data.
#> x Existing data has 44 rows.
#> x Assigned data has 0 rows.
#> i Only vectors of size 1 are recycled.
#> # Resampling results
#> # Validation Set Split (0.75/0.25)
#> # A tibble: 1 x 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [131/44]> validation <NULL> <tibble [1 x 1]>
Created on 2022-05-23 by the reprex package (v2.0.1)