I am replicating a methodology that uses Individual Conditional Expectation (ICE) plots to explore heterogeneity in predictor effects across neighbourhoods (UK LSOA). The paper states
In this study, ICE was applied to each demographic indicator to assess whether its association with low Mapillary coverage was consistent in direction but varied in intensity, or whether it exhibited directional reversals across neighbourhoods. To summarise local effects, the slope maps of ICE curves at the LSOA centroids were computed, which highlight the spatial distribution of marginal effects.
Using the iml package, I currently fit a linear regression across each ICE curve (sampled at 5 grid points, for simplicity) and take the slope coefficient. This gives me an average slope over the feature’s domain.
Does slopes at the LSOA centroids mean I should instead compute the local derivative of the ICE curve at the observed feature value for each neighborhood, rather than the average slope across the whole curve?
Here is a simplified version of my code:
library(randomForest)
library(iml)
library(dplyr)
# Dummy dataset
data(mtcars)
# Fit a random forest
rf_fit2 <- randomForest(mpg ~ ., data = mtcars)
# Wrap in iml Predictor
predictor2 <- Predictor$new(
model = rf_fit2,
data = mtcars[, -1], # predictors only
y = mtcars$mpg
)
# Function to compute average slope across ICE curve
compute_ice_slope2 <- function(feature_name, predictor){
ice_obj <- FeatureEffect$new(
predictor,
feature = feature_name,
method = "ice",
grid.size = 5 # for simplicity
)
ice_obj$results |>
group_by(.id) |>
summarise(
slope = coef(lm(.value ~ .data[[feature_name]]))[2]
)
}
# Example: slope for 'hp'
slopes_hp2 <- compute_ice_slope2("hp", predictor2)
head(slopes_hp2)
# Plot ICE curves
plot(ice_obj)
Session Info:
> sessionInfo()
R version 4.5.2 (2025-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)
Matrix products: default
LAPACK version 3.12.1
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
time zone: Europe/Budapest
tzcode source: internal
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] randomForest_4.7-1.2 iml_0.11.4 GPfit_1.0-9 janitor_2.2.1 lubridate_1.9.5 forcats_1.0.1
[7] stringr_1.6.0 readr_2.2.0 tidyverse_2.0.0 patchwork_1.3.2 reshape2_1.4.5 treeshap_0.4.0
[13] future_1.69.0 fastshap_0.1.1 shapviz_0.10.3 kernelshap_0.9.1 tibble_3.3.1 doParallel_1.0.17
[19] iterators_1.0.14 foreach_1.5.2 ranger_0.18.0 yardstick_1.3.2 workflowsets_1.1.1 workflows_1.3.0
[25] tune_2.0.1 tidyr_1.3.2 tailor_0.1.0 rsample_1.3.2 recipes_1.3.1 purrr_1.2.1
[31] parsnip_1.4.1 modeldata_1.5.1 infer_1.1.0 ggplot2_4.0.2 dplyr_1.2.0 dials_1.4.2
[37] scales_1.4.0 broom_1.0.12 tidymodels_1.4.1
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.18.0 jsonlite_2.0.0 magrittr_2.0.4 farver_2.1.2 fs_1.6.6
[7] ragg_1.5.0 vctrs_0.7.1 memoise_2.0.1 sparsevctrs_0.3.6 usethis_3.2.1 curl_7.0.0
[13] xgboost_3.2.0.1 parallelly_1.46.1 KernSmooth_2.23-26 desc_1.4.3 plyr_1.8.9 cachem_1.1.0
[19] lifecycle_1.0.5 pkgconfig_2.0.3 Matrix_1.7-4 R6_2.6.1 fastmap_1.2.0 snakecase_0.11.1
[25] digest_0.6.39 furrr_0.3.1 ps_1.9.1 pkgload_1.5.0 textshaping_1.0.4 labeling_0.4.3
[31] timechange_0.4.0 compiler_4.5.2 proxy_0.4-29 remotes_2.5.0 withr_3.0.2 S7_0.2.1
[37] backports_1.5.0 DBI_1.2.3 pkgbuild_1.4.8 MASS_7.3-65 lava_1.8.2 sessioninfo_1.2.3
[43] classInt_0.4-11 tools_4.5.2 units_1.0-0 future.apply_1.20.2 nnet_7.3-20 Metrics_0.1.4
[49] doFuture_1.2.1 glue_1.8.0 callr_3.7.6 grid_4.5.2 sf_1.0-24 checkmate_2.3.4
[55] generics_0.1.4 gtable_0.3.6 tzdb_0.5.0 class_7.3-23 data.table_1.18.2.1 hms_1.1.4
[61] utf8_1.2.6 pillar_1.11.1 splines_4.5.2 lhs_1.2.0 lattice_0.22-9 sfd_0.1.0
[67] survival_3.8-6 tidyselect_1.2.1 hardhat_1.4.2 devtools_2.4.6 timeDate_4052.112 stringi_1.8.7
[73] DiceDesign_1.10 pacman_0.5.1 codetools_0.2-20 cli_3.6.5 rpart_4.1.24 systemfonts_1.3.1
[79] processx_3.8.6 dichromat_2.0-0.1 Rcpp_1.1.1 globals_0.19.0 ellipsis_0.3.2 gower_1.0.2
[85] listenv_0.10.0 viridisLite_0.4.3 ipred_0.9-15 prodlim_2025.04.28 e1071_1.7-17
Edit 1
I am not sure it's correct because “at the LSOA centroids” is almost certainly spatial language: I have one ICE curve per LSOA, and I map a scalar summary (here: a slope / marginal effect) to the centroid location of that LSOA.
Because each ICE curve is f(x_j, x_{-j,i}) as a function of feature x_j while holding the other features fixed at LSOA i’s values, a “slope map” usually aims to show a local marginal effect for each LSOA, that is the derivative (or a local finite-difference approximation) at the observed feature value for that LSOA, then plotted at that LSOA’s centroid.
I think my current approach (a linear regression over the whole ICE curve) is an average slope over the feature’s domain, which is a different estimand and can hide sign changes or curvature (exactly the thing ICE is meant to reveal).
How can I create the "slope map"? This is an example of the "slope map" from the paper I am following:
Edit 2
I found the ICEbox package and the function dice (Estimates the partial derivative function for each curve in an ice object. See Goldstein et al (2013) for further details.), which I believe is the correct approach:
## Not run:
require(ICEbox)
require(randomForest)
require(MASS) #has Boston Housing data, Pima
######## regression example
data(Boston) #Boston Housing data
X = Boston
y = X$medv
X$medv = NULL
## build a RF:
bhd_rf_mod = randomForest(X, y)
## Create an 'ice' object for the predictor "age":
bhd.ice = ice(object = bhd_rf_mod, X = X, y = y, predictor = "age", frac_to_build = .1)
# make a dice object:
bhd.dice = dice(bhd.ice)
summary(bhd.dice)
print(bhd.dice)
str(bhd.dice)
> bhd.dice = dice(bhd.ice)
Estimating derivatives using Savitzky-Golay filter (window = 15 , order = 2 )
> summary(bhd.dice)
dice object generated on data with n = 51 for predictor "age"
predictor considered continuous, logodds off
> print(bhd.dice)
dice object generated on data with n = 51 for predictor "age"
predictor considered continuous, logodds off
> str(bhd.dice)
List of 15
$ gridpts : num [1:48] 2.9 8.9 16.3 18.5 21.5 26.3 29.1 31.9 33 34.9 ...
$ predictor : chr "age"
$ xj : num [1:51] 2.9 8.9 16.3 18.5 21.5 26.3 29.1 31.9 33 34.9 ...
$ logodds : logi FALSE
$ probit : logi FALSE
$ xlab : chr "age"
$ nominal_axis: logi FALSE
$ range_y : num 45
$ sd_y : num 9.2
$ Xice :Classes ‘data.table’ and 'data.frame':51 obs. of 13 variables:
..$ crim : num [1:51] 0.1274 0.2141 0.1621 0.0724 0.0907 ...
..$ zn : num [1:51] 0 22 20 60 45 45 45 95 12.5 22 ...
..$ indus : num [1:51] 6.91 5.86 6.96 1.69 3.44 3.44 3.44 2.68 6.07 5.86 ...
..$ chas : int [1:51] 0 0 0 0 0 0 0 0 0 0 ...
..$ nox : num [1:51] 0.448 0.431 0.464 0.411 0.437 ...
..$ rm : num [1:51] 6.77 6.44 6.24 5.88 6.95 ...
..$ age : num [1:51] 2.9 8.9 16.3 18.5 21.5 26.3 29.1 31.9 33 34.9 ...
..$ dis : num [1:51] 5.72 7.4 4.43 10.71 6.48 ...
..$ rad : int [1:51] 3 7 3 4 5 5 5 4 4 7 ...
..$ tax : num [1:51] 233 330 223 411 398 398 398 224 345 330 ...
..$ ptratio: num [1:51] 17.9 19.1 18.6 18.3 15.2 15.2 15.2 14.7 18.9 19.1 ...
..$ black : num [1:51] 385 377 397 392 378 ...
..$ lstat : num [1:51] 4.84 3.59 6.59 7.79 5.1 2.87 4.56 2.88 8.79 9.16 ...
..- attr(*, ".internal.selfref")=<externalptr>
$ pdp : Named num [1:48] 21.8 21.8 21.8 21.8 21.8 ...
..- attr(*, "names")= chr [1:48] "2.9" "8.9" "16.3" "18.5" ...
$ d_ice_curves: num [1:51, 1:48] 0.007971 0.003749 0.000212 -0.006136 0.00869 ...
$ dpdp : num [1:48] -0.000231 -0.000111 -0.000159 -0.001541 -0.002096 ...
$ actual_deriv: num [1:51] 0.00797 0.00359 0.00232 -0.01326 0.01083 ...
$ sd_deriv : num [1:48] 0.00316 0.0031 0.00489 0.00968 0.00671 ...
- attr(*, "class")= chr "dice"
I think what I need to extract bhd.dice$actual_deriv and merge it back to my dataset. Am I right?
