Hello!
I've been trying to create and save a persistent multi fold object for repeated CV (created with createMultiFolds
) so that I could save it and come back to it in some time in case I would like to repeat the entire analysis. I basically wanna avoid playing with random seeds but just have that set in stone. It's not hard in itself but the problem starts when I'm trying to use it with case weights.
I've experimented with all kinds of possibilities including: creating a 'case weight' variable role in recipes, but none of them work really like I want. I do not want to use sampling instead so it's not really an option here.
In the end I though that perhaps when using multi folds I also need to have a similar list structure with weights in a list (like multi folds outcome), but apparently it's not the case (you can see it in my example).
Please find a reproducible example below, but it doesn't work as of now. The error thrown is:
Warning messages:
1: model fit failed for Fold1.Rep1: alpha=0.00, lambda=1 Error in (function (x, y, family = c("gaussian", "binomial", "poisson", :
number of elements in weights (25) not equal to the number of rows of x (80)
Full code below:
set.seed(42)
# Loading libraries -------------------------------------------------------
library(magrittr)
library(tidyverse)
library(tidymodels)
library(dials)
library(furrr)
# Loading input dataset ---------------------------------------------------
df_all <- iris %>%
filter(Species != "setosa") %>%
mutate(Species = factor(Species, levels = c("versicolor", "virginica")))
# Preparing the recipes ----------------------------------------------------
# I need to add a custom step over here on the missing patterns
en_rec <- df_all %>%
recipe(Species ~ .) %>%
step_pca(all_predictors(), num_comp = 2)
# Training models ---------------------------------------------------------
folds <- createMultiFolds(df_all$Species, k = 5, times = 5)
ctrl <- trainControl(
method = "repeatedcv",
number = 5,
repeats = 5,
index = folds,
verboseIter = TRUE,
summaryFunction = defaultSummary,
returnResamp = "final",
savePredictions = "final"
)
en_grid <- expand.grid(
alpha = c(0, .25, .50, .75, 1),
lambda = 10 ^ seq(-4, 0, length = 30)
)
en_model <- train(
en_rec,
data = df_all,
method = "glmnet",
trControl = ctrl,
tuneGrid = en_grid,
weights = map(folds, ~if_else(df_all$Species[.x] == "versicolor", 10, 1))
)