Logistic regression model gives different results while tuning vs single model for LOOCV

ursulahneumann · June 14, 2023, 10:47pm

I’ve made a small reproducible example using a dataframe with group data and 3 features. I have code that makes a single logistic regression model, and code that I’ve used for tuning. In the example I only use the condition penalty = 0.001 and mixture = 0 for simplicity and compare the accuracy, ROC AUC, and class predictions. I understand LOOCV/pathologically small samples sizes is not ideal.

The results summary is below. I believe that the results should be identical and wondering if I am doing something wrong or not fully understanding what’s happening.

accuracy:
• single model = 0.52,
• tuned model = 0.52

loocv auc:
• single model = 0.532,
• tuned model = 0.538

.pred_group2 for .row 1:
• single model = 0.331
• tuned model = 0.324

# ---- Load libraries and prepare data -----------------------------------------

library(tidymodels)

set.seed(1)

df <- data.frame(
  group = c('group1', 'group1', 'group2', 'group2', 'group1', 'group2', 
            'group2', 'group2', 'group2', 'group1', 'group1', 'group2', 
            'group2', 'group1', 'group1', 'group1', 'group1', 'group1', 
            'group1', 'group1', 'group2', 'group2', 'group2', 'group2', 
            'group1'),
  var1 = c(7.3333333, 3.0000000, 1.6666667, 0.6666667, 0.6666667, 1.0000000,
           0.0000000, 2.0000000, 1.0000000, 2.6666667, 1.3333333, 1.0000000,
           0.6666667, 1.0000000, 1.6666667, 5.6666667, 0.3333333, 3.3333333,
           4.3333333, 1.6666667, 0.6666667, 12.6666667, 1.0000000, 3.0000000, 
           1.0000000),
  var2 = c(6.0, 2.5, 0.0, 0.0, 0.0, 1.0, 0.5, 0.5, 0.0, 3.5, 0.0, 0.0, 0.5, 0.5,
           0.5, 2.0, 0.0, 0.0, 0.5, 0.0, 0.5, 7.5, 0.0, 3.5, 1.0),
  var3 = c(4.5833333, 3.7500000, 0.5000000, 1.0000000, 0.5000000, 0.5000000,
           1.0000000, 1.6666667, 0.8333333, 1.5000000, 1.5000000, 0.5000000,
           0.2500000, 1.3333333, 0.5000000, 3.2500000, 0.0000000, 0.7500000, 
           2.5000000, 1.0000000, 0.7500000, 3.5000000, 0.2500000, 1.0000000,
           0.0000000)
)

# ---- Single model ------------------------------------------------------------

# Prepare the recipe
rec <- recipe(group ~ . , 
              data = df) 

# Define the logistic regression model
EN_mod <- logistic_reg(penalty = 0.001, mixture = 0) %>% 
  set_mode("classification") %>% 
  set_engine("glmnet")

# Create a workflow which contains the recipe and the model
ckd_wflow <-
  workflow() %>%
  add_recipe(rec) %>%
  add_model(EN_mod)

# Fit the workflow
ckd_fit <- 
  ckd_wflow %>% 
  fit(data = df)

# View the model
ckd_fit %>%
  extract_fit_parsnip() %>%
  tidy()

# Create LOOCV folds
folds = vfold_cv(df, 
                 v = nrow(df))

# Function to get model coefficients
get_glmnet_coefs <- function(x) {
  x %>%
    extract_fit_engine() %>%
    tidy(return_zeros = TRUE) %>%
    rename(penalty = lambda)
}

# Perform resampling workflow.
resamp <- 
  ckd_wflow %>% 
  fit_resamples(
    resamples = folds, 
    control = control_resamples(save_pred = TRUE,
                                extract = get_glmnet_coefs))

# Collect metrics for resampling object.  It only shows accuracy since it's 
# LOOCV, we have to extract the predictions for each fold and combine ourselves
# to get a ROC AUC.
collect_metrics(resamp)

# Extract predictions from resampling object
preds <- bind_rows(resamp$.predictions)

# View preds for .row = 1 for comparison to tuning model
preds %>%
  filter(.row == 1)

# Generate ROC curve and print AUC for resampling
autoplot(roc_curve(data = preds, truth = group, .pred_group2))
print(roc_auc(data = preds, truth = group, .pred_group2))


# ---- Tune --------------------------------------------------------------------

# Define the penalty values to test
pen_vals <- c(0.001)

# Define the grid of penalty, and mixtures values to test
grid <- crossing(penalty = pen_vals,
                 mixture = c(0))

# Define recipe for tuning 
rec_tune <- recipe(group ~ . ,
                   data = df)

# Define elastic net model for tuning 
EN_mod_tune <- 
  logistic_reg(penalty = tune(), mixture = tune()) %>% 
  set_engine("glmnet", path_values = pen_vals) %>% 
  set_mode("classification")

# Define workflow for tuning
ckd_wflow_tune <-
  ckd_wflow %>% 
  update_model(EN_mod_tune) %>%
  update_recipe(rec_tune)

# Define control grid to extract coefficients and save predictions
parsnip_ctrl <- control_grid(extract = get_glmnet_coefs,
                             save_pred = TRUE)

# Perform tuning with loocv
glmnet_res <- 
  ckd_wflow_tune %>% 
  tune_grid(
    resamples = folds,
    grid = grid,
    control = parsnip_ctrl
  )

# Get accuracy
glmnet_res %>%
  collect_metrics() %>%
  filter(.metric == 'accuracy')

# Get predictions for .row = 1 
glmnet_res %>%
  collect_predictions() %>%
  filter(.row == 1)

# Get loocv AUC
glmnet_res %>%
  collect_predictions() %>%
  dplyr::group_by(mixture, penalty, .config) %>%
  summarise(loocv_auc = roc_auc_vec(truth = group, estimate = .pred_group2))

# Print model estimates for Fold25 which is for .row == 1
glmnet_res %>%
  unnest(.extracts) %>%
  rename(penalty1 = penalty) %>%
  unnest(.extracts) %>%
  select(id, term, estimate) %>%
  filter(id == 'Fold25')

# For the single model I don't know how to identify which fold was for .row == 1
# but none of the intercepts = 0.538 which is what is it for Fold25 of the tuned
# model suggesting the underlying models LOO sample are different
resamp %>%
  unnest(.extracts) %>%
  unnest(.extracts) %>%
  select(id, term, estimate) %>%
  filter(term == '(Intercept)',
         estimate == 0.568)

Max · June 18, 2023, 4:29pm

I think that there are a few things going on here.

First, I would make sure that, if you want to get equal models, set the path_values argument to be the same sequence of values (and not a single value).

Second, we don't really support LOOCV in any format and we should not allow

 vfold_cv(df,  v = nrow(df))

to work.

For LOOCV, you get n separate predictions and compute one performance measure. With the vfold_cv() code above, it is computing n performance measures and each is based on n=1 data points. This are not the same things and we should throw an error if v is too large (relative to the number of rows in the training set).

system · July 9, 2023, 4:29pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.