Overview
I have produced four models using the tidymodels package with the data frame FID (see below):
- General Linear Model
- Bagged Tree
- Random Forest
- Boosted Trees
The data frame contains three predictors :
- Year (numeric)
- Month (Factor)
- Days (numeric)
The dependent variable is Frequency (numeric)
The original penalty was 0.1 for regularization, which I picked somewhat arbitrarily. My aim is to estimate the right or best regularization parameter penalty. The idea is to estimate the model hyperparameter (best value model), which cannot be assessed during model training. I am attempting to estimate the best penalty value by training many models on resampled data sets and exploring how well they perform. Consequently, I am building a new model specification for model tuning.
I am following this tutorial:-
I am experiencing this error message
Error: A `model` action has already been added to this workflow.
#Run rlang::last_error()
<error/rlang_error>
A `model` action has already been added to this workflow.
Backtrace:
1. tune::tune_grid(...)
10. workflows::add_model(., tune_spec_glm)
11. workflows:::add_action(x, action, "model")
13. workflows:::add_action_impl.action_fit(x, action, name)
14. workflows:::check_singleton(x$fit$actions, name)
15. workflows:::glubort("A `{name}` action has already been added to this workflow.")
Run `rlang::last_trace()` to see the full context.
If anyone can help me solve this issue, I would be deeply appreciative.
Many thanks.
R-code
##Open the tidymodels package
library(tidymodels)
library(glmnet)
library(parsnip)
library(rpart.plot)
library(rpart)
library(tidyverse) # manipulating data
library(skimr) # data visualization
library(baguette) # bagged trees
library(future) # parallel processing & decrease computation time
library(xgboost) # boosted trees
library(ranger)
library(yardstick)
library(purrr)
library(forcats)
#split this single dataset into two: a training set and a testing set
data_split <- initial_split(FID)
# Create data frames for the two sets:
train_data <- training(FID)
test_data <- testing(FID)
# resample the data with 10-fold cross-validation (10-fold by default)
cv <- vfold_cv(train_data, v=10)
###########################################################
##Produce the recipe
rec <- recipe(Frequency ~ ., data = FID) %>%
step_nzv(all_predictors(), freq_cut = 0, unique_cut = 0) %>% # remove variables with zero variances
step_novel(all_nominal()) %>% # prepares test data to handle previously unseen factor levels
step_medianimpute(all_numeric(), -all_outcomes(), -has_role("id vars")) %>% # replaces missing numeric observations with the median
step_dummy(all_nominal(), -has_role("id vars")) # dummy codes categorical variables
##########################################################
##Produce Models
##########################################################
##General Linear Models
##########################################################
##Produce the glm model
mod_glm<-linear_reg(mode="regression",
penalty = 0.1,
mixture = 1) %>%
set_engine("glmnet")
##Create workflow
wflow_glm <- workflow() %>%
add_recipe(rec) %>%
add_model(mod_glm)
##Fit the glm model
###########################################################################
MODEL EVALUATION
##Estimate how well that model performs, let’s fit many times,
##once to each of these resampled folds, and then evaluate on the heldout
##part of each resampled fold.
##########################################################################
plan(multisession)
fit_glm <- fit_resamples(
wflow_glm,
cv,
metrics = metric_set(rmse, rsq),
control = control_resamples(save_pred = TRUE)
)
##Collect model predictions for each K-fold for the number of blue whale sightings
Blue_Predictions<-fit_glm %>%
collect_predictions()
#######Tuning hyperparameters
##Estimating the best regularization penalty to configure the best value model
##by estimating the best value by training many models on resamples data sets
##and exploring how well these models perform
tune_spec_glm <- linear_reg(penalty = tune(), mixture = 1) %>%
set_mode("regression") %>%
set_engine("glmnet")
tune_spec_glm
##Create a regular grid of value to try using a convenience function for
##penalty
lambda_grid <- grid_regular(penalty(), levels = 30)
lambda_grid
####
tune_rs <- tune_grid(
wflow_glm %>% add_model(tune_spec_glm),
cv,
grid = lambda_grid,
control = control_resamples(save_pred = TRUE)
)
##Error message
Error: A `model` action has already been added to this workflow.
Run `rlang::last_error()` to see where the error occurred.
Data Frame - FID
structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017), Month = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L), .Label = c("January", "February", "March",
"April", "May", "June", "July", "August", "September", "October",
"November", "December"), class = "factor"), Frequency = c(36,
28, 39, 46, 5, 0, 0, 22, 10, 15, 8, 33, 33, 29, 31, 23, 8, 9,
7, 40, 41, 41, 30, 30, 44, 37, 41, 42, 20, 0, 7, 27, 35, 27,
43, 38), Days = c(31, 28, 31, 30, 6, 0, 0, 29, 15,
29, 29, 31, 31, 29, 30, 30, 7, 0, 7, 30, 30, 31, 30, 27, 31,
28, 30, 30, 21, 0, 7, 26, 29, 27, 29, 29)), row.names = c(NA,
-36L), class = "data.frame")