You have not provided your data, so I have created an example using the mtcars dataset that is included in R. I have also chosen to use the modern {tidymodels} infrastructure to essentially replicate the approach shown above which uses {caret}, which has sort of been superseded by `tidymodels, even though it hasn't been deprecated, per se.
library(tidymodels)
library(themis)
set.seed(12345)
# Prepare some example data
data <-
mtcars %>%
mutate(vs = factor(vs))
# Split the data 70/30 for training/testing
train_test_split <- initial_split(data, 0.7, strata = 'vs')
# Extract training data
train <- training(train_test_split)
# Create 5-fold 3-repeat CV splits
cv_folds <- vfold_cv(train, v = 5, repeats = 3, strata = 'vs')
# Simple recipe with ROSE
rec <- recipe(vs ~ ., data = train) %>%
step_rose(vs)
# Specify model
model <- rand_forest(
mode = 'classification',
engine = 'randomForest',
mtry = tune(),
trees = tune(),
min_n = tune()
)
# Define workflow
wflow <- workflow(
preprocessor = rec,
spec = model
)
# Perform CV
res <-
tune_grid(
object = wflow,
resamples = cv_folds,
control = control_grid(save_pred = TRUE)
)
# Find the best hyperparameters for the randomForest model
best_hyperparams <- select_best(res)
# Add those hyperparams to the workflow
final_wflow <- finalize_workflow(wflow, best_hyperparams)
# Fit the updated workflow to the whole train data
# Evaluate performance in the held-out test data
final_fit <- last_fit(final_wflow, train_test_split)
# Confusion matrix in the test data
collect_predictions(final_fit) %>%
conf_mat(vs, .pred_class)
#> Truth
#> Prediction 0 1
#> 0 4 2
#> 1 2 3
This should provide a strong starting point to modify for your project.