Hi,
While I was learning fit_resample, I had a question.
at first, this is my code.
library(tidymodels)
library(tidyverse)
data(cells)
set.seed(123)
cell_split <- initial_split(cells %>% select(-case),
strata = class)
cell_train <- training(cell_split)
cell_test <- testing(cell_split)
set.seed(345)
folds <- vfold_cv(cell_train, v = 10)
rf_mod <-
rand_forest(trees = 1000) %>%
set_engine("ranger") %>%
set_mode("classification")
rf_wf <-
workflow() %>%
add_model(rf_mod) %>%
add_formula(class ~ .)
set.seed(456)
rf_fit_rs <-
rf_wf %>%
fit_resamples(folds)
collect_metrics(rf_fit_rs,summarize = F) %>%
filter(id == "Fold01")
1 Fold01 accuracy binary 0.822
2 Fold01 roc_auc binary 0.892
The .metric column is the validation error, right?
To check it, I took out Fold01 and created a model, but it did not give the same value of the metric.
Additional question.
The same goes for last_fit, but how can I check the training error?
I am thinking that by checking the training error, we can notice the overtraining.
splss <- folds %>%
filter(id=="Fold01") %>%
pull(splits)
ana_data <- splss[[1]] %>%
analysis()
ass_data <- splss[[1]] %>%
assessment()
set.seed(456)
ana_model <- rf_wf%>%
fit(ana_data)
# i think this is train error
predict(ana_model,ana_data) %>%
bind_cols(class=ana_data$class) %>%
accuracy(truth=class,estimate=.pred_class)
1 accuracy binary 0.993
# valid error
predict(ana_model,ass_data) %>%
bind_cols(class=ass_data$class) %>%
accuracy(truth=class,estimate=.pred_class)
1 accuracy binary 0.829
thank you!