Hi,
I am looking at a data set across groups to try and predict \hat{y} using tidymodels
. It would seem to be a good candidate for the MARs algorithm as it automatically finds interactions. My over arching plan was to run MARs then use a pdp (myabe using the pdp package) to plot the obvious interaction which i artificially created in grp_1
and grp_2
. Unfortunately i clearly have chosen the wrong model or I just cant see how to set it up without splitting the data myself manually and then running a regression separately. Can anyone help? Is there anyway to disentangle this without manually splitting the groups?
Thanks very much for your time
library(tidymodels)
library(tidyverse)
options(scipen = 999)
grp_1 <- seq(2, -2,length.out = 50) %>%
enframe() %>%
mutate(grp = 1) %>%
mutate(y_hat = seq(16, 4,length.out = 50))
grp_2 <- seq(2, -2,length.out = 50) %>%
enframe() %>%
mutate(grp = 2) %>%
mutate(y_hat = seq(4, 16,length.out = 50))
mydf <- bind_rows(grp_1, grp_2) %>%
select(x = value, grp, y_hat) %>%
mutate(grp = as.character(grp))
norm_recipe <- recipe(y_hat ~ ., data = mydf) %>%
step_dummy(grp) %>%
prep(training = mydf, retain = TRUE)
juice(norm_recipe) %>% glimpse()
fit_mars <- mars() %>%
set_engine("earth") %>%
set_mode("regression") %>%
fit(y_hat ~ ., data = juice(norm_recipe))
test_results <- predict(fit_mars, new_data = juice(norm_recipe)) %>%
rename(fit_mars = .pred) %>%
bind_cols(y_hat = mydf$y_hat)
ggplot(mydf, aes(x=x, y=y_hat, colour = grp)) +
geom_hline(yintercept=test_results$fit_mars) +
geom_point() +
theme_minimal() +
ggtitle('Relationship between x and y_hat with a made up interaction')