Question about Stacks & TidyModels

My question refers to the output of collect_metrics and rmse_vec. I think they should be identical. They are not. Here's my quick example:

# Load libraries and read data ----
library(tidyverse)
library(tidymodels)
library(stacks)

# Data Splitting & Setting CV ----

set.seed(1234)

diamonds <- diamonds %>%    
  slice_sample(n=2000)

diamonds_split <- initial_split(diamonds, prop = 0.80, strata="price")

diamonds_train <- training(diamonds_split)
diamonds_test <- testing(diamonds_split)

folds <- vfold_cv(diamonds_train, v = 10, strata="price")

metric <- metric_set(rmse)

ctrl_grid <- control_stack_grid()

# Models : Linear Regression ---- 

linreg_spec <-
  linear_reg(engine="lm") 

linreg_rec <-
  recipe(price ~ ., data = diamonds_train) %>%
  step_dummy(all_nominal_predictors()) 

linreg_wflow <- 
  workflow() %>%
  add_model(linreg_spec) %>%
  add_recipe(linreg_rec)

linreg_res <- 
  fit_resamples(
    linreg_wflow,
    resamples = folds,
    metrics = metric,
    control = ctrl_grid
  )

collect_metrics(linreg_res)

# Models : Decision Trees ----

tree_spec <- 
  decision_tree(
    mode = "regression",
    tree_depth = tune("depth"),
    engine = "rpart"
  ) 

tree_rec <-
  recipe(price ~ ., data = diamonds_train) 

tree_wflow <- 
  workflow() %>% 
  add_model(tree_spec) %>%
  add_recipe(tree_rec)

tree_grid = expand.grid(depth=c(3,5))

tree_res <- 
  tune_grid(
    tree_wflow,
    resamples = folds,
    metrics = metric,
    grid = tree_grid,
    control = ctrl_grid
  )

collect_metrics(tree_res)

# Create the stack ----

diamonds_st <- 
  stacks() %>%
  add_candidates(linreg_res) %>%
  add_candidates(tree_res)  

# RMSEs all members of stack

diamonds_tbl <- diamonds_st %>% 
  as_tibble() 

diamonds_tbl <- diamonds_tbl %>% 
  map(rmse_vec, truth = diamonds_tbl$price) %>%
  as_tibble() 

diamonds_tbl

From collect_metrics I get for Linear Regression and Trees a RMSE of 1224, 1388 and 1290 respectively. Using rmse_vec at the very end of the code I get 1243, 1399 and 1301. They are in the same vicinity, but ... shouldn't they be exactly the same?

Thanks in advance for your help

\E

Thanks for the reprex! This is a good question.

Your expectation that these values likely ought to be similar makes sense. However, they indeed are different values.

The metrics listed under mean in these tables are the validation set RMSEs averaged across folds.

collect_metrics(linreg_res)
#> # A tibble: 1 × 6
#>   .metric .estimator  mean     n std_err .config             
#>   <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1 rmse    standard   1224.    10    71.1 Preprocessor1_Model1

collect_metrics(tree_res)
#> # A tibble: 2 × 7
#>   depth .metric .estimator  mean     n std_err .config             
#>   <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
#> 1     3 rmse    standard   1388.    10    58.1 Preprocessor1_Model1
#> 2     5 rmse    standard   1290.    10    55.9 Preprocessor1_Model2

The metrics from the data stack are the RMSEs derived from the assessment set predictions averaged across folds.

diamonds_tbl
#> # A tibble: 1 × 4
#>   price linreg_res_1_1 tree_res_1_1 tree_res_1_2
#>   <dbl>          <dbl>        <dbl>        <dbl>
#> 1     0          1243.        1399.        1301.

In the former, the process is 1) generate predictions for each fold, 2) take RMSE from predictions grouped by fold, 3) average RMSEs across folds. In the latter, the process is 1) generate predictions for each fold, 2) average predictions across folds, 3) take RMSE.

Created on 2023-06-11 with reprex v2.0.2

Thank you very much. Makes sense. That's why the mean of rsme's from collect_metrics has a std_err and the one from rmse_vec does not. Brilliant.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.