noob here trying to learn ML using tidymodels. I'm using the titanic dataset to build a workflow to train on the existing data and generate predictions for the test data. In my previous attempt, I was successfully able to generate the predictions using a decision tree algorithm by splitting in to train and test (80/20). Now I've attempted a bootstrapped random forest but I'm facing a problem with the code when it comes predicting the results. Seeking advice from experienced R folks who may be able to point to my shortcomings. Sharing the relevant code details below with the error message:
given_data <- read.csv("train.csv", header = T) #titanic train data
to_predict <- read.csv("test.csv", header = T) #passenger data to predict survival
#avoiding data manipulation code for brevity
titanic_recipe <- recipe(survived ~ ., data = given_data) %>%
step_dummy(all_nominal_predictors()) %>%
step_normalize(all_numeric_predictors()) %>%
update_role(passenger_id, new_role = "id_variable")
titanic_folds <- bootstraps(data = given_data,
times = 25)
rf_model <- rand_forest(trees = 1000) %>%
set_engine("ranger") %>%
set_mode("classification")
rf_wf <- workflow() %>%
add_model(rf_model) %>%
add_recipe(titanic_recipe) %>%
fit_resamples(titanic_folds, save_pred = T)
collect_metrics(rf_wf)
# A tibble: 2 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.814 25 0.00432 Preprocessor1_Model1
2 roc_auc binary 0.865 25 0.00324 Preprocessor1_Model1
was able to arrive at the roc_auc of 0.865. Now I'd like to fit this onto the test data for submissions but i'm facing the following message:
rf_predict <- predict(fit(rf_wf, data = given_data), to_predict)
Error in UseMethod("fit") :
no applicable method for 'fit' applied to an object of class "c('resample_results', 'tune_results', 'tbl_df', 'tbl', 'data.frame')"
I'm clearly missing a step before using the predict(fit(...)
function. Would be glad to know how to arrive at a prediction for a new dataset with this existing workflow.