There is a known issue with loading models from .Rds that were fitted with the {lightgbm} engine. This can be circumvented by using the dedicated write_rds and read_rds methods as provided by the {lightgbm}-package, as demonstrated in this reprex provided on github::
library(tidymodels)
library(bonsai)
library(lightgbm)
# data
data <- modeldata::ames %>%
janitor::clean_names()
data <- subset(data, select = c(sale_price, bedroom_abv_gr, bsmt_full_bath, bsmt_half_bath, enclosed_porch, fireplaces,
full_bath, half_bath, kitchen_abv_gr, garage_area, garage_cars, gr_liv_area, lot_area,
lot_frontage, year_built, year_remod_add, year_sold))
data$id <- c(1:nrow(data))
data <- data %>%
mutate(id = as.character(id)) %>%
select(id, everything())
# model specification
lgbm_model <- boost_tree(
mtry = 7,
trees = 347,
min_n = 10,
tree_depth = 12,
learn_rate = 0.0106430579211173,
loss_reduction = 0.000337948798058139,
) %>%
set_mode("regression") %>%
set_engine("lightgbm", objective = "regression")
# recipe and workflow
lgbm_recipe <- recipe(sale_price ~., data = data) %>%
update_role(id, new_role = "ID") %>%
step_corr(all_predictors(), threshold = 0.7)
lgbm_workflow <- workflow(preprocessor = lgbm_recipe,
spec = lgbm_model)
# fit workflow
fit_lgbm_workflow <- lgbm_workflow %>%
fit(data = data)
# predict
data_predict <- subset(data, select = -c(sale_price))
predict(fit_lgbm_workflow, new_data = data_predict)
#> # A tibble: 2,930 × 1
#> .pred
#> <dbl>
#> 1 201911.
#> 2 124695.
#> 3 138983.
#> 4 221095.
#> 5 198972.
#> 6 188613.
#> 7 198730.
#> 8 170893.
#> 9 243899.
#> 10 196875.
#> # … with 2,920 more rows
# save the trained workflow and lgb.booster object separately
saveRDS(fit_lgbm_workflow, "lgbm_wflw.rds")
saveRDS.lgb.Booster(extract_fit_engine(fit_lgbm_workflow), "lgbm_booster.rds")
# load trained workflow and merge it with lgb.booster
new_lgbm_wflow <- readRDS("lgbm_wflw.rds")
new_lgbm_wflow$fit$fit$fit <- readRDS.lgb.Booster("lgbm_booster.rds")
predict(new_lgbm_wflow, data_predict)
#> # A tibble: 2,930 × 1
#> .pred
#> <dbl>
#> 1 201911.
#> 2 124695.
#> 3 138983.
#> 4 221095.
#> 5 198972.
#> 6 188613.
#> 7 198730.
#> 8 170893.
#> 9 243899.
#> 10 196875.
#> # … with 2,920 more rows
Created on 2022-09-07 with reprex v2.0.2
While this works for single models, I am now faced with having to write to and read a model stack from disk that includes a {lightgbm} candidate. How can I go about this, if the regular readr::write_rds() and readr::read_rds() functions lead again to the dreaded error associated with {lightgbm} read from .Rds-files?
Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.