I guess that my question is kinda weird but:
I'm working on a university project where I have to use a RandomForest model to predict if patients have depressive tendencies. And while I'm getting results, I'm not sure if they are really valid or legit because of the seed. Here is a code snippet:
for (t in 1:5) {
set.seed(123)
seed <- sample.int(100)
set.seed(seed)
seeds <- vector(mode = "list", length = 50)
for(i in 1:50){
seeds[[i]] <- sample.int(1000, 12)}
#For the last model:
seeds[[50]] <- sample.int(1000, 1)
yourdata_neu$Depressiv <- as.factor(yourdata_neu$Depressiv)
inTraining <- createDataPartition(yourdata_neu$Depressiv[1:nrow(yourdata_neu)], p = 0.70, list = FALSE) #75% der Probanden in Training, 25 in Test
training <- yourdata_neu[inTraining,]
testing <- yourdata_neu[-inTraining,]
train_control <- trainControl(method="cv", number=10, verboseIter = TRUE, seeds = seeds, search = "grid")
model <- train(training[,1:ncol(yourdata_neu)-1],as.factor(training[,ncol(yourdata_neu)]), method = "rf", type="classification", metric= "Accuracy", maximize= TRUE, trControl = train_control, importance = TRUE)
model1 <- randomForest(training[,1:ncol(yourdata_neu)-1],as.factor(training[,ncol(yourdata_neu)]), type="classification", importance = TRUE, proximity = TRUE)
prediction1 <- predict(model1, testing[,1:ncol(yourdata_neu)-1])
prediction2 <- predict(model, testing[,1:ncol(yourdata_neu)-1])
print(confusionMatrix(prediction2, as.factor(testing[,ncol(yourdata_neu)]), positive = "1"))
Basically I'm setting my seed to "123" at the beginning of my loop, after that I'm generating the numbers 1 to 100 in a random order and save them in a variable. This variable is my new seed for the whole model. The variable "seeds" is for my trainControl and contains a list with 50 entries for a seed. (1 till 49 have 12 numbers each, the last one gets only one number after the seed-loop) I repeat these steps for every iteration of the model.
My result for the Accuracy is constant after every iteration. But because I have to write a scientific paper about my model, I'm not quite sure if I can set my seed as I did? Or if there is another way to make my results reproducible for academic matters? I'm grateful for every input