Rule Fit with Tidymodels `Rules`

These might be redundant if they were used in a tree-based model but ruleFit adds them to a linear model along with the original predictor. This allows it to model the predictors used in the splits in nonlinear ways.

It ends up being similar to a really crude spline model. Similar approaches are discussed Feature Engineering and Selection.

For the example above, a linear regression shows the nonlinearity although it is not very strong in this example

library(broom)
library(ggplot2)

data(penguins, package = "modeldata")

penguins <- penguins[complete.cases(penguins), ]

f <- body_mass_g ~ flipper_length_mm + I( flipper_length_mm <  227 ) + 
  I( flipper_length_mm <  228.5 ) + I( flipper_length_mm >= 197.5 ) + 
  I( flipper_length_mm >= 224.5 )

pen_fit <- lm(f, data = penguins)
grid <- data.frame(flipper_length_mm = seq(170, 234, by = 1 / 4))
pen_res <- augment(pen_fit, newdata = grid)

ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) + 
  geom_point(alpha = 1 /2 ) + 
  geom_line(data = pen_res, aes(y = .fitted), col = "red") +
  geom_vline(xintercept = c(227, 228.5, 197.5, 224.5), lty = 3) +
  theme_bw()

Created on 2022-09-07 with reprex v2.0.2

1 Like