I am running a random forest regression (RFR) task and I want to apply the Drop-column importance strategy. The basic idea of this strategy is:
to get a baseline performance score as with permutation importance, but then drop a column entirely, retrain the model, and recompute the performance score. The importance value of a feature is then the difference between the baseline and the score from the model missing that feature.
I found this strategy here and here.
Using the ranger
package, how can I implement the above strategy and so in the end I could have the final model with the most important predictors (based on the above strategy) and maybe print the variables?
library(ranger)
train.idx <- sample(nrow(iris), 2/3 * nrow(iris))
iris.train <- iris[train.idx, ]
iris.test <- iris[-train.idx, ]
rg.iris <- ranger(Species ~ .,
data = iris.train,
num.trees = 101,
importance = "permutation")
Windows 11, R 4.3.3, RStudio 2023.12.1 Build 402.