Going through TidyModels in R and got to a super cool section on partial dependence profiles. I am having trouble reproducing the code on partial dependence profiles, specifically the model_profile
function from DALEX throws an error about loss of precision for the column the text wants to explain, Year_Built
.
Due to the structure of this chapter, it uses code from previous chapters, making it a little challenging to piece together. Here is what I believe the whole code is, including the random forest model from chapter 10. It's the last line that is giving me trouble.
library(tidymodels)
library(DALEXtra)
data(ames)
ames <- mutate(ames, Sale_Price = log10(Sale_Price))
# split the data
set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
# recipe preprocessing and what is being predicted
ames_rec <-
recipe(Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type +
Latitude + Longitude, data = ames_train) %>%
step_log(Gr_Liv_Area, base = 10) %>%
step_other(Neighborhood, threshold = 0.01) %>%
step_dummy(all_nominal_predictors()) %>%
step_interact( ~ Gr_Liv_Area:starts_with("Bldg_Type_") ) %>%
step_ns(Latitude, Longitude, deg_free = 20)
# random forest
rf_model <-
rand_forest(trees = 1000) %>%
set_engine("ranger") %>%
set_mode("regression")
# workflow, add formula rather than recipe
# minimal to no preprocessing needed
rf_wflow <-
workflow() %>%
add_formula(
Sale_Price ~ Neighborhood + Gr_Liv_Area + Year_Built + Bldg_Type +
Latitude + Longitude) %>%
add_model(rf_model)
# normal fitting example
rf_fit <- rf_wflow %>%
fit(ames_train)
# isolate features
vip_features <- c("Neighborhood", "Gr_Liv_Area", "Year_Built",
"Bldg_Type", "Latitude", "Longitude")
vip_train <-
ames_train %>%
select(all_of(vip_features))
# create a DALEX explainer
explainer_rf <-
explain_tidymodels(
rf_fit,
data = vip_train,
y = ames_train$Sale_Price,
label = "random forest",
verbose = FALSE
)
# this doesn't work on Year_Built
set.seed(1805)
pdp_age <- model_profile(explainer_rf, N = 500, variables = "Year_Built")
I end up getting this error:
> pdp_age <- model_profile(explainer_rf, N = 500, variables = "Year_Built")
Error in `stop_vctrs()`:
! Can't convert from `Year_Built` <double> to `Year_Built` <integer> due to loss of precision.
• Locations: 2, 3, 5, 13, 14, 49, 53, 72, 73, 75, 83, 84, 119, 123, 142, 143, 145, 153, 154, 189, 193, 212, ...
Run `rlang::last_error()` to see where the error occurred.
Finding it difficult to parse together the code from previous chapters. Any insights into why this code won't run?
Always learning,
Zach