i am using caret::safs()
for some supervised feature selection, and trying to better understand how to set the resampling scheme using trainControl
and safsControl
- both seem to have options to set the resampling method, number and repeats. I've been reading through the docs and what examples I can find, and I'm not totally clear on if I need to set the resampling scheme in both or just one.
The caret
package book even notes the options are similar between the 2 functions:
Some important options to
safsControl
are:
method
,number
,repeats
,index
,indexOut
, etc: options similar to those fortrain
top control resampling.
my questions boil down to the following:
- if resampling should be set in both, why? and,
- if resampling just needs to be defined in one of them, which one?
Here's a non-working representative example of the code i'm using to conduct safs:
#set resampling scheme in trainControl
train_ctrl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 3,
classProbs = TRUE,
summaryFunction = twoClassSummary,
savePredictions = "final",
allowParallel = FALSE #FALSE here but TRUE below so as to not square number of workers
)
caretSA$fitness_extern <- twoClassSummary
# also set it in in safsControl - is this needed?
safs_ctrl <- safsControl(functions = caretSA,
method = "repeatedcv",
number = 10,
repeats = 3,
metric = c(internal = "ROC", external = "ROC"),
maximize = c(internal = TRUE, external = TRUE),
allowParallel = TRUE,
verbose = TRUE)
sa_results <- safs(my_recipe,
data = training_data,
iters = 10,
method = "glm",
# are both of these needed???
trControl = train_ctrl,
safsControl = safs_ctrl)