Tidymodels: Tunable models involving 10-fold Cross Validation Using the Function tune_grid() in R

Kaikash777 · December 13, 2020, 2:43pm

Overview

I have produced four models using the tidymodels package with the data frame FID (see below):

General Linear Model
Bagged Tree
Random Forest
Boosted Trees

The data frame contains three predictors :

Year (numeric)
Month (Factor)
Days (numeric)

The dependent variable is Frequency (numeric)

The original penalty was 0.1 for regularization, which I picked somewhat arbitrarily. My aim is to estimate the right or best regularization parameter penalty. The idea is to estimate the model hyperparameter (best value model), which cannot be assessed during model training. I am attempting to estimate the best penalty value by training many models on resampled data sets and exploring how well they perform. Consequently, I am building a new model specification for model tuning.

I am following this tutorial:-

I am experiencing this error message

Error: A `model` action has already been added to this workflow.

#Run rlang::last_error()

<error/rlang_error>
A `model` action has already been added to this workflow.
Backtrace:
  1. tune::tune_grid(...)
 10. workflows::add_model(., tune_spec_glm)
 11. workflows:::add_action(x, action, "model")
 13. workflows:::add_action_impl.action_fit(x, action, name)
 14. workflows:::check_singleton(x$fit$actions, name)
 15. workflows:::glubort("A `{name}` action has already been added to this workflow.")
Run `rlang::last_trace()` to see the full context.

If anyone can help me solve this issue, I would be deeply appreciative.

Many thanks.

R-code

##Open the tidymodels package
library(tidymodels)
library(glmnet)
library(parsnip)
library(rpart.plot)
library(rpart)
library(tidyverse) # manipulating data
library(skimr) # data visualization
library(baguette) # bagged trees
library(future) # parallel processing & decrease computation time
library(xgboost) # boosted trees
library(ranger)
library(yardstick)
library(purrr)
library(forcats)


#split this single dataset into two: a training set and a testing set
data_split <- initial_split(FID)
# Create data frames for the two sets:
train_data <- training(FID)
test_data  <- testing(FID)

# resample the data with 10-fold cross-validation (10-fold by default)
cv <- vfold_cv(train_data, v=10)

###########################################################
##Produce the recipe

rec <- recipe(Frequency ~ ., data = FID) %>% 
          step_nzv(all_predictors(), freq_cut = 0, unique_cut = 0) %>% # remove variables with zero variances
          step_novel(all_nominal()) %>% # prepares test data to handle previously unseen factor levels 
          step_medianimpute(all_numeric(), -all_outcomes(), -has_role("id vars"))  %>% # replaces missing numeric observations with the median
          step_dummy(all_nominal(), -has_role("id vars")) # dummy codes categorical variables
  
##########################################################
##Produce Models
##########################################################
##General Linear Models
##########################################################

##Produce the glm model
mod_glm<-linear_reg(mode="regression",
                       penalty = 0.1, 
                       mixture = 1) %>% 
                            set_engine("glmnet")

##Create workflow
wflow_glm <- workflow() %>% 
                add_recipe(rec) %>%
                      add_model(mod_glm)

    ##Fit the glm model

###########################################################################

MODEL EVALUATION

##Estimate how well that model performs, let’s fit many times, 
##once to each of these resampled folds, and then evaluate on the heldout 
##part of each resampled fold.
##########################################################################
plan(multisession)

fit_glm <- fit_resamples(
                        wflow_glm,
                        cv,
                        metrics = metric_set(rmse, rsq),
                        control = control_resamples(save_pred = TRUE)
                        )

##Collect model predictions for each K-fold for the number of blue whale sightings

Blue_Predictions<-fit_glm %>% 
                    collect_predictions()

#######Tuning hyperparameters

##Estimating the best regularization penalty to configure the best value model 
##by estimating the best value by training many models on resamples data sets
##and exploring how well these models perform

tune_spec_glm <- linear_reg(penalty = tune(), mixture = 1) %>%
                                          set_mode("regression") %>%
                                                       set_engine("glmnet")

tune_spec_glm

##Create a regular grid of value to try using a convenience function for 
##penalty

lambda_grid <- grid_regular(penalty(), levels = 30)

lambda_grid

####

tune_rs <- tune_grid(
                    wflow_glm %>% add_model(tune_spec_glm),
                    cv,
                    grid = lambda_grid,
                    control = control_resamples(save_pred = TRUE)
                    )

##Error message

Error: A `model` action has already been added to this workflow.
Run `rlang::last_error()` to see where the error occurred.

Data Frame - FID

structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015,
2015, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017,
2017, 2017, 2017, 2017, 2017, 2017, 2017), Month = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 11L, 12L), .Label = c("January", "February", "March",
"April", "May", "June", "July", "August", "September", "October",
"November", "December"), class = "factor"), Frequency = c(36,
28, 39, 46, 5, 0, 0, 22, 10, 15, 8, 33, 33, 29, 31, 23, 8, 9,
7, 40, 41, 41, 30, 30, 44, 37, 41, 42, 20, 0, 7, 27, 35, 27,
43, 38), Days = c(31, 28, 31, 30, 6, 0, 0, 29, 15,
29, 29, 31, 31, 29, 30, 30, 7, 0, 7, 30, 30, 31, 30, 27, 31,
28, 30, 30, 21, 0, 7, 26, 29, 27, 29, 29)), row.names = c(NA,
-36L), class = "data.frame")

Max · December 13, 2020, 4:45pm

Use update_model() since you have already added one:

system · January 3, 2021, 4:45pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.