My question refers to the output of collect_metrics and rmse_vec. I think they should be identical. They are not. Here's my quick example:
# Load libraries and read data ----
library(tidyverse)
library(tidymodels)
library(stacks)
# Data Splitting & Setting CV ----
set.seed(1234)
diamonds <- diamonds %>%
slice_sample(n=2000)
diamonds_split <- initial_split(diamonds, prop = 0.80, strata="price")
diamonds_train <- training(diamonds_split)
diamonds_test <- testing(diamonds_split)
folds <- vfold_cv(diamonds_train, v = 10, strata="price")
metric <- metric_set(rmse)
ctrl_grid <- control_stack_grid()
# Models : Linear Regression ----
linreg_spec <-
linear_reg(engine="lm")
linreg_rec <-
recipe(price ~ ., data = diamonds_train) %>%
step_dummy(all_nominal_predictors())
linreg_wflow <-
workflow() %>%
add_model(linreg_spec) %>%
add_recipe(linreg_rec)
linreg_res <-
fit_resamples(
linreg_wflow,
resamples = folds,
metrics = metric,
control = ctrl_grid
)
collect_metrics(linreg_res)
# Models : Decision Trees ----
tree_spec <-
decision_tree(
mode = "regression",
tree_depth = tune("depth"),
engine = "rpart"
)
tree_rec <-
recipe(price ~ ., data = diamonds_train)
tree_wflow <-
workflow() %>%
add_model(tree_spec) %>%
add_recipe(tree_rec)
tree_grid = expand.grid(depth=c(3,5))
tree_res <-
tune_grid(
tree_wflow,
resamples = folds,
metrics = metric,
grid = tree_grid,
control = ctrl_grid
)
collect_metrics(tree_res)
# Create the stack ----
diamonds_st <-
stacks() %>%
add_candidates(linreg_res) %>%
add_candidates(tree_res)
# RMSEs all members of stack
diamonds_tbl <- diamonds_st %>%
as_tibble()
diamonds_tbl <- diamonds_tbl %>%
map(rmse_vec, truth = diamonds_tbl$price) %>%
as_tibble()
diamonds_tbl
From collect_metrics I get for Linear Regression and Trees a RMSE of 1224, 1388 and 1290 respectively. Using rmse_vec at the very end of the code I get 1243, 1399 and 1301. They are in the same vicinity, but ... shouldn't they be exactly the same?
Thanks in advance for your help
\E