last_fit locking RStudio

captain_chuck · March 22, 2024, 12:57am

Hello All,

I am trying to create a paper/presentation that "teaches" tidymodels. Things work well until I get to the very end and then everything hangs. I might do the presentation as a cliffhanger. "And next time ..."

The problem is working with the "last_fit" object that comes out of last_fit. (Louise Sinks mentions this in her blog. Louise E. Sinks - A Tidymodels Tutorial: A Structural Approach)

Posit Cloud just hangs when I try to work with the last_fit object. It's happened with everything from a simple call like names(). There is no error. It just "freezes." When I log out and (hours later) go back in, it's still just a blank screen.

I am trying to get data to post a reproducible example. If anyone has experienced this problem and can guide me before I get to that, then "yeah!" I can give the code (see below). It's just finding a minimal data set. The data set I use is the Framingham data set from Kaggle.

The paper is due Sunday, but I have resigned to not finishing it and leaving that last few steps (of using the final model) off.

Pax et bonum,

Chuck

---------------------------------------------------------------
# install.packages("glmnet")
# install.packages("tidyverse")
# install.packages("tidymodels")

library(Matrix)        
library(tidyverse)
library(tidymodels)
library(readr)
tidymodels_prefer()

# Load Framingham dataset (assuming it's available in your working directory)
# Assuming 'TenYearCHD' (Ten Year Coronary Heart Disease) is the target variable

framingham_data <- read_csv("framingham.csv") %>% 
  mutate(TenYearCHD = as.factor(TenYearCHD))
# ---------------------------
set.seed(111211211)
splits      <- initial_split(framingham_data, strata = TenYearCHD)

# create the data sets
fram_train <- training(splits)
fram_test  <- testing(splits)

# ---------------------------
set.seed(11122)
fram_cv <- vfold_cv(fram_train, v = 10, strata = TenYearCHD)

fhd_recipe <- 
  recipe(TenYearCHD ~ ., data = fram_train) %>% 
  step_impute_bag(all_predictors()) %>% 
  step_normalize(all_predictors())

# ---------------------------------------------------
fhd_mod <-  
  logistic_reg(penalty = tune(), mixture = 1) %>% 
  set_engine("glmnet")

fhd_wf <-
  workflow() %>% 
  add_model(fhd_mod) %>% 
  add_recipe(fhd_recipe)

fhd_lr_vec <- tibble(penalty = 10^seq(-4, -1, length.out = 30))

# --------------------------------------------------- 
fhd_lr_fit <-
  fhd_wf %>% 
  tune_grid(fram_cv,
            grid = fhd_lr_vec,
            control = control_grid(save_pred = TRUE),
            metrics = metric_set(accuracy,roc_auc))

fhd_lr_best_auc <-
  fhd_lr_fit %>% 
  select_best(metric = "roc_auc")

fhd_lr_best_acc <-
  fhd_lr_fit %>% 
  select_best(metric = "accuracy")

top_models <-
  fhd_lr_fit %>% 
  show_best(metric = "roc_auc", n = 15) %>% 
  arrange(penalty) 

lr_best <- 
  fhd_lr_fit %>% 
  collect_metrics() %>% 
  filter(.metric == "accuracy") %>% 
  arrange(desc(mean)) %>% 
  slice(1)

lr_auc <- 
  fhd_lr_fit %>% 
  collect_predictions(parameters = lr_best) %>% 
  roc_curve(TenYearCHD, .pred_0) %>% 
  mutate(model = "Logistic Regression")

# Finalize our model
# 
final_wf <- 
  fhd_wf %>% 
  finalize_workflow(lr_best)

fhd_last <- final_wf %>% 
  last_fit(splits,
           metrics = metric_set(roc_auc))

# these calls "lock up" RStudio
class(fhd_last)
names(fhd_last)

fhd_last_fit <- fhd_last %>% 
  extract_fit_parsnip() %>% 
  tidy()

fhd_last_fit
# -------------------------------------------
# From here on, it's very flaky
names(fhd_last_fit)

fhd_laug <- augment(fhd_last)
class(fhd_laug)
names(fhd_laug)

predns <- fhd_last %>% collect_predictions()
head(predns)

fit <- fhd_last %>% extract_fit_parsnip() 

df_preds <- fhd_last_fit %>% augment(new_data = NULL)

Max · March 22, 2024, 10:35am

I only saw a failure on the last two lines. I wasn';t should what you were trying to do.

If you start with a workflow, you should always predict with the workflow (not with the fhd_last_fit object). It's possible to get the worng values (silently) siunce the preprocessing might not be right.

I suggest using

fit <- fhd_last %>% extract_workflow() 
df_preds <- fit %>% augment(new_data = fram_test)

captain_chuck · March 22, 2024, 1:15pm

I only saw a failure on the last two lines.

Good. It could happen for me on other lines but not consistently.

I wasn';t should what you were trying to do.

Yeah, as I got to the end, I was trying all sorts of things.

If you start with a workflow, you should always predict with the workflow (not with the fhd_last_fit object).
It's possible to get the worng values (silently) siunce the preprocessing might not be right.

This is probably the key. (Well, I'm sure it is. You would know. ) Very good advice. I missed this in all my searching but will definitely make it clear to others. (assuming it doesn't break, 'cause then I'm coming back. )

I suggest using

fit <- fhd_last %>% extract_workflow() 
df_preds <- fit %>% augment(new_data = fram_test)

Wonderful. Thank you! Thank you for the very quick reply, too.

Chuck

system · March 29, 2024, 1:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.