In tidymodels, how to fit bootsampled RF workflow to new data for predictions?

ramakant · March 11, 2023, 7:58pm

noob here trying to learn ML using tidymodels. I'm using the titanic dataset to build a workflow to train on the existing data and generate predictions for the test data. In my previous attempt, I was successfully able to generate the predictions using a decision tree algorithm by splitting in to train and test (80/20). Now I've attempted a bootstrapped random forest but I'm facing a problem with the code when it comes predicting the results. Seeking advice from experienced R folks who may be able to point to my shortcomings. Sharing the relevant code details below with the error message:

given_data <- read.csv("train.csv", header = T)  #titanic train data
to_predict <- read.csv("test.csv", header = T) #passenger data to predict survival

#avoiding data manipulation code for brevity

titanic_recipe <- recipe(survived ~ ., data = given_data) %>% 
  step_dummy(all_nominal_predictors()) %>% 
  step_normalize(all_numeric_predictors()) %>% 
  update_role(passenger_id, new_role = "id_variable")

titanic_folds <- bootstraps(data = given_data, 
                            times = 25)

rf_model <- rand_forest(trees = 1000) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

rf_wf <- workflow() %>% 
  add_model(rf_model) %>% 
  add_recipe(titanic_recipe) %>% 
  fit_resamples(titanic_folds, save_pred = T)
collect_metrics(rf_wf)

# A tibble: 2 × 6
  .metric  .estimator  mean     n std_err .config             
  <chr>    <chr>      <dbl> <int>   <dbl> <chr>               
1 accuracy binary     0.814    25 0.00432 Preprocessor1_Model1
2 roc_auc  binary     0.865    25 0.00324 Preprocessor1_Model1

was able to arrive at the roc_auc of 0.865. Now I'd like to fit this onto the test data for submissions but i'm facing the following message:

rf_predict <- predict(fit(rf_wf, data = given_data), to_predict)
Error in UseMethod("fit") : 
  no applicable method for 'fit' applied to an object of class "c('resample_results', 'tune_results', 'tbl_df', 'tbl', 'data.frame')"

I'm clearly missing a step before using the predict(fit(...) function. Would be glad to know how to arrive at a prediction for a new dataset with this existing workflow.

hannah · March 13, 2023, 10:58am

Your rf_wf object is a workflow that has already been fitted (with the bootstrap resamples). That's why you get an error when you try to fit it again. You can make an unfitted workflow object and then fit it with the different datasets (the resamples vs the full training data), for example, like this:

rf_wf <- workflow() %>% 
  add_model(rf_model) %>% 
  add_recipe(titanic_recipe) 

resampled_wf <- rf_wf %>% 
  fit_resamples(titanic_folds, save_pred = T)
collect_metrics(resampled_wf)

wf_fitted_to_training <- rf %>% 
  fit(given_data)

rf_predictions <- predict(wf_fitted_to_training, to_predict)

system · March 21, 2023, 10:41am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.