Checking the training error using fit_resample and reproducing using Fold01

Hi,
While I was learning fit_resample, I had a question.

at first, this is my code.

library(tidymodels)
library(tidyverse)

data(cells)
set.seed(123)
cell_split <- initial_split(cells %>% select(-case), 
                            strata = class)

cell_train <- training(cell_split)
cell_test  <- testing(cell_split)

set.seed(345)
folds <- vfold_cv(cell_train, v = 10)

rf_mod <- 
  rand_forest(trees = 1000) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

rf_wf <- 
  workflow() %>%
  add_model(rf_mod) %>%
  add_formula(class ~ .)

set.seed(456)

rf_fit_rs <- 
  rf_wf %>% 
  fit_resamples(folds)

collect_metrics(rf_fit_rs,summarize = F) %>% 
  filter(id == "Fold01")


1 Fold01 accuracy binary         0.822
2 Fold01 roc_auc  binary         0.892


The .metric column is the validation error, right?
To check it, I took out Fold01 and created a model, but it did not give the same value of the metric.

Additional question.
The same goes for last_fit, but how can I check the training error?
I am thinking that by checking the training error, we can notice the overtraining.

splss <- folds %>% 
  filter(id=="Fold01") %>% 
  pull(splits)

ana_data <- splss[[1]] %>% 
  analysis()

ass_data <- splss[[1]] %>% 
  assessment()

set.seed(456)


ana_model <- rf_wf%>% 
  fit(ana_data)

# i think this is train error
predict(ana_model,ana_data) %>% 
  bind_cols(class=ana_data$class) %>% 
  accuracy(truth=class,estimate=.pred_class)

1 accuracy binary         0.993

# valid error
predict(ana_model,ass_data) %>% 
  bind_cols(class=ass_data$class) %>% 
  accuracy(truth=class,estimate=.pred_class)

1 accuracy binary         0.829

thank you!

I replaced the model with one that does not have randomness, and it matches.
Thank you.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.