See the training data at this GitHub Gist. I'm trying out a very simple model with just a spline interacted by a nominal variable. Using step_bs
, I do:
rec_bs <- recipe(stars ~ date + film, dat_train) %>%
step_bs(date, deg_free = tune(), degree = tune()) %>%
step_dummy(film) %>%
step_interact(~ starts_with("date"):starts_with("film"))
Note I'm tuning both the degrees of freedom (deg_free
) and the degree of the polynomial (degree
).
My model is just vanilla OLS:
lm_mod <- linear_reg() %>%
set_engine("lm")
And the workflow:
wf_bs <- workflow() %>%
add_model(lm_mod) %>%
add_recipe(rec_bs)
I do some crossvalidation to get the best parameters:
grid_bs <- tibble(deg_free = rep(4:10, 3), degree = rep(2:4, each = 7))
folds <- vfold_cv(dat_train, v = 10)
cv_bs <- wf_bs %>%
tune_grid(resamples = folds, grid = grid_bs)
best_bs <- select_by_one_std_err(cv_bs, deg_free, degree, metric = "rsq")
So now note that best_bs
looks like:
> best_bs
# A tibble: 1 x 9
deg_free degree .metric .estimator mean n std_err .best .bound
<int> <int> <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl>
1 4 2 rsq standard 0.189 10 0.00729 0.190 0.183
So, I want to finalize the workflow with both deg_free
and degree
. But I get an error:
> wf_bs %>%
+ finalize_workflow(best_bs) # erroring here
Error in names(param) <- pset$name :
'names' attribute [2] must be the same length as the vector [1]
In addition: Warning message:
In pset$component_id == step_ids :
longer object length is not a multiple of shorter object length
It looks like it's expecting only one parameter, but I already told it to tune for two. You can see this by calling parameters()
:
> parameters(wf_bs)
Collection of 2 parameters for tuning
id parameter type object class
deg_free deg_free nparam[+]
degree degree nparam[+]
What's weird is if you just give it one column, it works fine:
> wf_bs %>%
+ finalize_workflow(best_bs[1])
══ Workflow ═══════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────
3 Recipe Steps
● step_bs()
● step_dummy()
● step_interact()
── Model ──────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)
Computational engine: lm
> # predict ----------------------------------------------------------------------
> wf_bs %>%
+ finalize_workflow(best_bs[2])
══ Workflow ═══════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────
3 Recipe Steps
● step_bs()
● step_dummy()
● step_interact()
── Model ──────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)
Computational engine: lm
Any ideas on what's going on here/how I can tell it that I want to supply both deg_free
and degree
? I get the same if I use a list instead of a tibble:
wf_bs %>%
finalize_workflow(list(degree = 2))
That works like above, but again I see an error when I try to supply both:
> wf_bs %>%
+ finalize_workflow(list(degree = 2, deg_free = 4))
Error in names(param) <- pset$name :
'names' attribute [2] must be the same length as the vector [1]
In addition: Warning message:
In pset$component_id == step_ids :
longer object length is not a multiple of shorter object length