Hello All,
I am trying to create a paper/presentation that "teaches" tidymodels. Things work well until I get to the very end and then everything hangs. I might do the presentation as a cliffhanger. "And next time ..."
The problem is working with the "last_fit" object that comes out of last_fit. (Louise Sinks mentions this in her blog. Louise E. Sinks - A Tidymodels Tutorial: A Structural Approach)
Posit Cloud just hangs when I try to work with the last_fit object. It's happened with everything from a simple call like names(). There is no error. It just "freezes." When I log out and (hours later) go back in, it's still just a blank screen.
I am trying to get data to post a reproducible example. If anyone has experienced this problem and can guide me before I get to that, then "yeah!" I can give the code (see below). It's just finding a minimal data set. The data set I use is the Framingham data set from Kaggle.
The paper is due Sunday, but I have resigned to not finishing it and leaving that last few steps (of using the final model) off.
Pax et bonum,
Chuck
---------------------------------------------------------------
# install.packages("glmnet")
# install.packages("tidyverse")
# install.packages("tidymodels")
library(Matrix)
library(tidyverse)
library(tidymodels)
library(readr)
tidymodels_prefer()
# Load Framingham dataset (assuming it's available in your working directory)
# Assuming 'TenYearCHD' (Ten Year Coronary Heart Disease) is the target variable
framingham_data <- read_csv("framingham.csv") %>%
mutate(TenYearCHD = as.factor(TenYearCHD))
# ---------------------------
set.seed(111211211)
splits <- initial_split(framingham_data, strata = TenYearCHD)
# create the data sets
fram_train <- training(splits)
fram_test <- testing(splits)
# ---------------------------
set.seed(11122)
fram_cv <- vfold_cv(fram_train, v = 10, strata = TenYearCHD)
fhd_recipe <-
recipe(TenYearCHD ~ ., data = fram_train) %>%
step_impute_bag(all_predictors()) %>%
step_normalize(all_predictors())
# ---------------------------------------------------
fhd_mod <-
logistic_reg(penalty = tune(), mixture = 1) %>%
set_engine("glmnet")
fhd_wf <-
workflow() %>%
add_model(fhd_mod) %>%
add_recipe(fhd_recipe)
fhd_lr_vec <- tibble(penalty = 10^seq(-4, -1, length.out = 30))
# ---------------------------------------------------
fhd_lr_fit <-
fhd_wf %>%
tune_grid(fram_cv,
grid = fhd_lr_vec,
control = control_grid(save_pred = TRUE),
metrics = metric_set(accuracy,roc_auc))
fhd_lr_best_auc <-
fhd_lr_fit %>%
select_best(metric = "roc_auc")
fhd_lr_best_acc <-
fhd_lr_fit %>%
select_best(metric = "accuracy")
top_models <-
fhd_lr_fit %>%
show_best(metric = "roc_auc", n = 15) %>%
arrange(penalty)
lr_best <-
fhd_lr_fit %>%
collect_metrics() %>%
filter(.metric == "accuracy") %>%
arrange(desc(mean)) %>%
slice(1)
lr_auc <-
fhd_lr_fit %>%
collect_predictions(parameters = lr_best) %>%
roc_curve(TenYearCHD, .pred_0) %>%
mutate(model = "Logistic Regression")
# Finalize our model
#
final_wf <-
fhd_wf %>%
finalize_workflow(lr_best)
fhd_last <- final_wf %>%
last_fit(splits,
metrics = metric_set(roc_auc))
# these calls "lock up" RStudio
class(fhd_last)
names(fhd_last)
fhd_last_fit <- fhd_last %>%
extract_fit_parsnip() %>%
tidy()
fhd_last_fit
# -------------------------------------------
# From here on, it's very flaky
names(fhd_last_fit)
fhd_laug <- augment(fhd_last)
class(fhd_laug)
names(fhd_laug)
predns <- fhd_last %>% collect_predictions()
head(predns)
fit <- fhd_last %>% extract_fit_parsnip()
df_preds <- fhd_last_fit %>% augment(new_data = NULL)