Loading a model stack from .Rds with {lightgbm} among the candidates

stargeysir · December 19, 2022, 1:56pm

There is a known issue with loading models from .Rds that were fitted with the {lightgbm} engine. This can be circumvented by using the dedicated write_rds and read_rds methods as provided by the {lightgbm}-package, as demonstrated in this reprex provided on github::

library(tidymodels)
library(bonsai)
library(lightgbm)

# data

data <- modeldata::ames %>%
  janitor::clean_names()

data <- subset(data, select = c(sale_price, bedroom_abv_gr, bsmt_full_bath, bsmt_half_bath, enclosed_porch, fireplaces,
                                full_bath, half_bath, kitchen_abv_gr, garage_area, garage_cars, gr_liv_area, lot_area,
                                lot_frontage, year_built, year_remod_add, year_sold))

data$id <- c(1:nrow(data))

data <- data %>%
  mutate(id = as.character(id)) %>%
  select(id, everything())

# model specification

lgbm_model <- boost_tree(
  mtry = 7,
  trees = 347,
  min_n = 10,
  tree_depth = 12,
  learn_rate = 0.0106430579211173,
  loss_reduction = 0.000337948798058139,
) %>%
  set_mode("regression") %>%
  set_engine("lightgbm", objective = "regression")

# recipe and workflow

lgbm_recipe <- recipe(sale_price ~., data = data) %>%
  update_role(id, new_role = "ID") %>%
  step_corr(all_predictors(), threshold = 0.7)

lgbm_workflow <- workflow(preprocessor = lgbm_recipe,
                          spec = lgbm_model)

# fit workflow

fit_lgbm_workflow <- lgbm_workflow %>%
  fit(data = data)

# predict

data_predict <- subset(data, select = -c(sale_price))
predict(fit_lgbm_workflow, new_data = data_predict)
#> # A tibble: 2,930 × 1
#>      .pred
#>      <dbl>
#>  1 201911.
#>  2 124695.
#>  3 138983.
#>  4 221095.
#>  5 198972.
#>  6 188613.
#>  7 198730.
#>  8 170893.
#>  9 243899.
#> 10 196875.
#> # … with 2,920 more rows

# save the trained workflow and lgb.booster object separately

saveRDS(fit_lgbm_workflow, "lgbm_wflw.rds")
saveRDS.lgb.Booster(extract_fit_engine(fit_lgbm_workflow), "lgbm_booster.rds")

# load trained workflow and merge it with lgb.booster

new_lgbm_wflow <- readRDS("lgbm_wflw.rds")
new_lgbm_wflow$fit$fit$fit <- readRDS.lgb.Booster("lgbm_booster.rds")

predict(new_lgbm_wflow, data_predict)
#> # A tibble: 2,930 × 1
#>      .pred
#>      <dbl>
#>  1 201911.
#>  2 124695.
#>  3 138983.
#>  4 221095.
#>  5 198972.
#>  6 188613.
#>  7 198730.
#>  8 170893.
#>  9 243899.
#> 10 196875.
#> # … with 2,920 more rows

Created on 2022-09-07 with reprex v2.0.2

While this works for single models, I am now faced with having to write to and read a model stack from disk that includes a {lightgbm} candidate. How can I go about this, if the regular readr::write_rds() and readr::read_rds() functions lead again to the dreaded error associated with {lightgbm} read from .Rds-files?

  Attempting to use a Booster which no longer exists. This can happen if you have called Booster$finalize() or if this Booster was saved with saveRDS(). To avoid this error in the future, use saveRDS.lgb.Booster() or Booster$save_model() to save lightgbm Boosters.

simoncouch · December 19, 2022, 2:17pm

Hi there!

The development version of the lightgbm R package supports saving with saveRDS()/readRDS() as normal, and will be hitting CRAN in the next few months, so this will "just work" soon. We've opted not to support lightgbm in bundle in anticipation of that package's release. The dev version of lightgbm already contains the changes needed to allow for saving with RDS, so if you'd like to stack with lightgbm ahead of their release, you can install the dev version of lightgbm and use it as an engine in tidymodels as usual.

If using the dev version is not an option for you, note that the lightgbm workflow fits are housed in the "member_fits" slot of a fitted model stack. In a pinch, you could use the

subset out -> save natively -> load natively -> subset in

idiom for each fitted member that's suggested in that reprex you've linked to. That approach is painful, but will no longer be a need once lightgbm releases 4.0.0.

stargeysir · December 20, 2022, 9:06am

great news about the dev version being put on CRAN soon, this will make {bonsai} a lot friendlier to work with. Thank you for your answer!

system · December 27, 2022, 9:06am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.