To create a tuning grid for a set of workflows (recipe/model combinations), I execute a loop that extracts the parameter set dials and then finalizes any unknown parameter ranges. This works great on all workflows, EXCEPT for those involving the SVM model; calling finalize()
always results in an error:
#> Error in `map()`:
#> ℹ In index: 2.
#> Caused by error in `object$finalize()`:
#> ! The matrix version of the initialization data is not numeric.
#> Run `rlang::last_trace()` to see where the error occurred.
It's worth noting that all variables in the baked recipe are doubles, except for the outcome variable—which is a factor. Here's a reprex:
# Load required libraries
library(modeldata)
library(tidymodels)
library(dplyr)
# Load data
data(attrition)
# Create a recipe that ensures a
base_recipe <- recipe(Attrition ~ ., data = attrition) %>%
step_zv(all_predictors()) %>%
step_naomit(all_predictors()) %>%
step_corr(all_numeric_predictors(), threshold = 0.9) %>%
step_YeoJohnson(all_numeric_predictors()) %>%
step_dummy(all_nominal_predictors()) %>%
step_zv(all_predictors()) %>%
step_normalize(all_predictors())
# list the variables and data types
base_recipe %>% prep() %>% bake(new_data = NULL) %>% glimpse()
#> Rows: 1,470
#> Columns: 59
#> $ Age <dbl> 0.521960942, 1.275977863, 0.102055073…
#> $ DailyRate <dbl> 0.75903101, -1.33414327, 1.33990843, …
#> $ DistanceFromHome <dbl> -1.49357558, 0.24333238, -1.03086414,…
#> $ HourlyRate <dbl> 1.35416941, -0.21060357, 1.26266458, …
#> $ MonthlyIncome <dbl> 0.28586776, 0.05281527, -1.44713307, …
#> $ MonthlyRate <dbl> 0.74743488, 1.39681749, -1.88197043, …
#> $ NumCompaniesWorked <dbl> 1.62077939, -0.57110745, 1.27090654, …
#> $ PercentSalaryHike <dbl> -1.4884114, 1.6791185, 0.2010641, -1.…
#> $ StockOptionLevel <dbl> -0.9316973, 0.2419060, -0.9316973, -0…
#> $ TotalWorkingYears <dbl> -0.24422094, 0.05247754, -0.41036000,…
#> $ TrainingTimesLastYear <dbl> -2.5781989, 0.2173107, 0.2173107, 0.2…
#> $ YearsAtCompany <dbl> 0.13964671, 0.76240120, -2.22884098, …
#> $ YearsInCurrentRole <dbl> 0.20549482, 0.88358757, -1.59589482, …
#> $ YearsSinceLastPromotion <dbl> -1.09449009, 0.09682314, -1.09449009,…
#> $ YearsWithCurrManager <dbl> 0.48998195, 0.90932557, -1.54963149, …
#> $ Attrition <fct> Yes, No, Yes, No, No, No, No, No, No,…
#> $ BusinessTravel_Travel_Frequently <dbl> -0.4816947, 2.0745914, -0.4816947, 2.…
#> $ BusinessTravel_Travel_Rarely <dbl> 0.6396229, -1.5623576, 0.6396229, -1.…
#> $ Department_Research_Development <dbl> -1.3735834, 0.7275275, 0.7275275, 0.7…
#> $ Department_Sales <dbl> 1.5147284, -0.6597352, -0.6597352, -0…
#> $ Education_1 <dbl> -0.89138490, -1.86779013, -0.89138490…
#> $ Education_2 <dbl> -0.04251052, 2.24372610, -0.04251052,…
#> $ Education_3 <dbl> 1.6079970, -0.5447859, 1.6079970, -1.…
#> $ Education_4 <dbl> -1.00681362, 0.07983544, -1.00681362,…
#> $ EducationField_Life_Sciences <dbl> 1.1936384, 1.1936384, -0.8372047, 1.1…
#> $ EducationField_Marketing <dbl> -0.3481364, -0.3481364, -0.3481364, -…
#> $ EducationField_Medical <dbl> -0.678910, -0.678910, -0.678910, -0.6…
#> $ EducationField_Other <dbl> -0.2429766, -0.2429766, 4.1128232, -0…
#> $ EducationField_Technical_Degree <dbl> -0.3139866, -0.3139866, -0.3139866, -…
#> $ EnvironmentSatisfaction_1 <dbl> -0.6603060, 0.2545383, 1.1693826, 1.1…
#> $ EnvironmentSatisfaction_2 <dbl> -0.9928824, -0.9928824, 1.0064835, 1.…
#> $ EnvironmentSatisfaction_3 <dbl> 1.4469968, -1.2421123, 0.5506271, 0.5…
#> $ Gender_Male <dbl> -1.2243282, 0.8162188, 0.8162188, -1.…
#> $ JobInvolvement_1 <dbl> 0.379543, -1.025818, -1.025818, 0.379…
#> $ JobInvolvement_2 <dbl> -0.4271984, -0.4271984, -0.4271984, -…
#> $ JobInvolvement_3 <dbl> -0.7783145, 1.5160483, 1.5160483, -0.…
#> $ JobRole_Human_Resources <dbl> -0.1914326, -0.1914326, -0.1914326, -…
#> $ JobRole_Laboratory_Technician <dbl> -0.4623065, -0.4623065, 2.1615955, -0…
#> $ JobRole_Manager <dbl> -0.2729664, -0.2729664, -0.2729664, -…
#> $ JobRole_Manufacturing_Director <dbl> -0.3306955, -0.3306955, -0.3306955, -…
#> $ JobRole_Research_Director <dbl> -0.2398224, -0.2398224, -0.2398224, -…
#> $ JobRole_Research_Scientist <dbl> -0.4977039, 2.0078601, -0.4977039, 2.…
#> $ JobRole_Sales_Executive <dbl> 1.8726493, -0.5336396, -0.5336396, -0…
#> $ JobRole_Sales_Representative <dbl> -0.2445418, -0.2445418, -0.2445418, -…
#> $ JobSatisfaction_1 <dbl> 1.1528613, -0.6606284, 0.2461164, 0.2…
#> $ JobSatisfaction_2 <dbl> 0.9821324, -1.0175000, -1.0175000, -1…
#> $ JobSatisfaction_3 <dbl> 0.5496309, 1.4543985, -1.2599042, -1.…
#> $ MaritalStatus_Married <dbl> -0.9186088, 1.0878621, -0.9186088, 1.…
#> $ MaritalStatus_Single <dbl> 1.4581537, -0.6853322, 1.4581537, -0.…
#> $ OverTime_Yes <dbl> 1.5912040, -0.6280274, 1.5912040, 1.5…
#> $ PerformanceRating_1 <dbl> -0.426085, 2.345353, -0.426085, -0.42…
#> $ PerformanceRating_2 <dbl> -0.426085, 2.345353, -0.426085, -0.42…
#> $ PerformanceRating_3 <dbl> -0.426085, 2.345353, -0.426085, -0.42…
#> $ RelationshipSatisfaction_1 <dbl> -1.5836393, 1.1910327, -0.6587487, 0.…
#> $ RelationshipSatisfaction_2 <dbl> 1.037082, 1.037082, -0.963588, -0.963…
#> $ RelationshipSatisfaction_3 <dbl> -0.3486405, 0.5365090, 1.4216585, -1.…
#> $ WorkLifeBalance_1 <dbl> -2.4929720, 0.3379811, 0.3379811, 0.3…
#> $ WorkLifeBalance_2 <dbl> 2.3033457, -0.4338557, -0.4338557, -0…
#> $ WorkLifeBalance_3 <dbl> 0.02755972, -0.75153240, -0.75153240,…
The random forest works fine (as do others):
# define a random forest
rf <- rand_forest(
mtry = tune(),
trees = tune()
) %>%
set_engine("ranger") %>%
set_mode("classification")
# Create a workflow
rf_workflow <- workflow() %>%
add_recipe(base_recipe) %>%
add_model(rf)
# Extract and finalize the parameters
rf_params <- extract_parameter_set_dials(rf_workflow)
rf_params
#> Collection of 2 parameters for tuning
#>
#> identifier type object
#> mtry mtry nparam[?]
#> trees trees nparam[+]
#>
#> Model parameters needing finalization:
#> # Randomly Selected Predictors ('mtry')
#>
#> See `?dials::finalize` or `?dials::update.parameters` for more information.
# finalize the parameters
rf_params_finalized <- finalize(rf_params, attrition)
rf_params_finalized
#> Collection of 2 parameters for tuning
#>
#> identifier type object
#> mtry mtry nparam[+]
#> trees trees nparam[+]
And now the SVM:
# Define the SVM model specification
svm <- svm_rbf(
cost = tune(),
rbf_sigma = tune()
) %>%
set_engine("kernlab") %>%
set_mode("classification")
# Create a workflow
svm_workflow <- workflow() %>%
add_recipe(base_recipe) %>%
add_model(svm)
# Extract and finalize the parameters
svm_params <- extract_parameter_set_dials(svm_workflow)
svm_params
#> Collection of 2 parameters for tuning
#>
#> identifier type object
#> cost cost nparam[+]
#> rbf_sigma rbf_sigma nparam[+]
svm_params_finalized <- finalize(svm_params, attrition)
#> Error in `map()`:
#> ℹ In index: 2.
#> Caused by error in `object$finalize()`:
#> ! The matrix version of the initialization data is not numeric.
The SVM doesn't need to be finalized since it doesn't have unknown parameters, but I'm not sure of a way around this since it is being called in a loop. (It also seems that it shouldn't cause an error.)
Any insights or suggestions?
Thank you!