Hi tidymodels team,
I tried to build a model with ranger engine for a classification task. And I've tuned the model and got the best mtry and min_n. So I used last_fit on the split object to build the model on the full train set with the best mtry and min_n:
# the last model
last_rf_mod <-
rand_forest(mtry =rf_best$mtry, min_n = rf_best$min_n, trees = 1000) %>%
set_engine("ranger", num.threads = cores, importance = "impurity") %>%
set_mode("classification")
last_rf_workflow <-
rf_workflow %>%
update_model(last_rf_mod)
set.seed(10086)
last_rf_fit <-
last_rf_workflow %>%
last_fit(., split = splits)
Then I used collect_metrics() to get the model performance on the test set:
last_rf_fit %>% collect_metrics()
However, I was also trying fit function as well as predict function to evalue the performance on test data:
last_rf_mod <-
rand_forest(mtry =rf_best$mtry, min_n = rf_best$min_n, trees = 1000) %>%
set_engine("ranger", num.threads = cores, importance = "impurity") %>%
set_mode("classification")
set.seed(10086)
rf_cls_fit <- last_rf_mod %>% fit(outcome ~ ., data = TrainSet)
rf_cls_fit
predict_res <- bind_cols(
predict(rf_cls_fit, TestSet),
predict(rf_cls_fit, TestSet, type = "prob")
)
And calculate the auc with the roc_auc function. But this gives different numbers from the auc in collect_metrics(). Am I doing something totally wrong? Look forward to your reply!
Best,
Ben