Random forest using tidymodels: Use a variable instead of an integer for the mtry parameter when building a model

I have the following code:

library(tidyverse)
ikea <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-11-03/ikea.csv")

ikea_df <- ikea %>%
  dplyr::select(price, name, category, depth, height, width) %>%
  mutate(price = log10(price)) %>%
  mutate_if(is.character, factor)

ikea_df

library(tidymodels)

set.seed(123)
ikea_split <- initial_split(ikea_df, strata = price)
ikea_train <- training(ikea_split)
ikea_test <- testing(ikea_split)

set.seed(234)
ikea_folds <- bootstraps(ikea_train, strata = price)
ikea_folds

library(usemodels)
use_ranger(price ~ ., data = ikea_train)

library(textrecipes)
ranger_recipe <-
  recipe(formula = price ~ ., data = ikea_train) %>%
  step_other(name, category, threshold = 0.01) %>%
  step_clean_levels(name, category) %>%
  step_impute_knn(depth, height, width)

features <- setdiff(names(ikea_train), "price")
value <- as.numeric(floor(length(features ) / 3))

ranger_spec <-
  rand_forest(mtry = value, min_n = tune(), trees = 1001) %>%
  set_mode("regression") %>%
  set_engine("ranger")

ranger_workflow <-
  workflow() %>%
  add_recipe(ranger_recipe) %>%
  add_model(ranger_spec)

set.seed(8577)
doParallel::registerDoParallel()
ranger_tune <-
  tune_grid(ranger_workflow,
            resamples = ikea_folds,
            grid = 11
  )

The above code shows this error: Warning message: All models failed. Run show_notes(.Last.tune.result) for more information.

show_notes(.Last.tune.result)
unique notes:
───────────────────────────────
Error: object 'value' not found

How can I store the value variable in such a way that it will be acceptable form the mtry() parameter?

R 4.3.2, RStudio 2023.12.1 Build 402, Windows 11.

I cannot see any reason why the content of value would not be passed to rand_forest. As a matter of fact, I ran your code on my machine and it went through with no errors and produced the following result:

> ranger_tune
# Tuning results
# Bootstrap sampling using stratification 
# A tibble: 25 × 4
   splits              id          .metrics          .notes          
   <list>              <chr>       <list>            <list>          
 1 <split [2770/994]>  Bootstrap01 <tibble [22 × 5]> <tibble [0 × 3]>
 2 <split [2770/1003]> Bootstrap02 <tibble [22 × 5]> <tibble [0 × 3]>
 3 <split [2770/1037]> Bootstrap03 <tibble [22 × 5]> <tibble [0 × 3]>
 4 <split [2770/1010]> Bootstrap04 <tibble [22 × 5]> <tibble [0 × 3]>
 5 <split [2770/1014]> Bootstrap05 <tibble [22 × 5]> <tibble [0 × 3]>
 6 <split [2770/1007]> Bootstrap06 <tibble [22 × 5]> <tibble [0 × 3]>
 7 <split [2770/1036]> Bootstrap07 <tibble [22 × 5]> <tibble [0 × 3]>
 8 <split [2770/1016]> Bootstrap08 <tibble [22 × 5]> <tibble [0 × 3]>
 9 <split [2770/1021]> Bootstrap09 <tibble [22 × 5]> <tibble [0 × 3]>
10 <split [2770/1043]> Bootstrap10 <tibble [22 × 5]> <tibble [0 × 3]>
# ℹ 15 more rows
# ℹ Use `print(n = ...)` to see more rows

I don't know what went wrong in your run, but it seems to me that the value in the error comment does not refer to your value variable. I can only suggest to start with a clean environment and try again.

My setup: R 4.3.3, RStudio 2023.12.1 build 402, MacOS Sonoma 14.4

based on MarekGierlinski's report, I go one step further and encourage you to rename your varaible value to something meaningful like my_mtry or mtry_value etc.

Apparently on a Mac laptop the code runs fine, I just tested it. The issue is on my Windows machine. I have tried several things without success. For example, I tried:

....
features <- setdiff(names(ikea_train), "ntl")
my_mtry <- length(features) / 3
.....

ranger_tune <-
    tune_grid(ranger_workflow,
            resamples = ikea_folds,
            grid = 3,
            metrics = metric_set(rsq)
  )

or

....
features <- setdiff(names(ikea_train), "ntl")
my_mtry <- as.integer(length(features) / 3)
.....

ranger_tune <-
    tune_grid(ranger_workflow,
            resamples = ikea_folds,
            grid = 3,
            metrics = metric_set(rsq)
  )

and many many more... The same error: Warning message: All models failed. Run show_notes(.Last.tune.result) for more information.

show_notes(.Last.tune.result)
unique notes:
───────────────────────────────
Error: object 'value' not found

When I type typeof(my_mtry) it shows "integer", so, theoretically, it shouldn't be an issue. But it is (on a Windows machine)

Maybe I should post this issue on GitHub to the creators of the package.

alter your ranger_spec like so :


ranger_spec <-
  rand_forest(mtry = !!my_mtry, min_n = tune(), trees = 1001) %>%
  set_mode("regression") %>%
  set_engine("ranger")

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.