Set seed Machine learning algorithms

angela_italy · April 15, 2023, 7:43am

Hello,
Can anyone clarify how is the best procedure to set.seed() before running a machine learning algorithms?
I have built a random forest model, a gbm model and a bart model.
Does every of them require a seed for reproducible results?
I have not split my dataset into train and test.
I have seen a lot of examples for random forest but I am not sure if this is required for BART and GBM as well.
An example of my models:

set.seed(500)
mod_BART <- bart(x.train = dataset[ , preds_selected], y.train = dataset[ , 1], keeptrees = TRUE)
summary(mod_BART)

set.seed(500)
formula_GBM <- as.formula(paste("presence ~", paste(preds_selected, collapse = "+")))
mod_GBM <- gbm(formula_GBM, data = dataset, distribution="bernoulli")

Also how many times should I set the seed?
if the models are in the same script is it enough to set only 1 seed before the first model?
Thanks a lot
Angela

Max · April 15, 2023, 4:34pm

I am pretty thorough (neurotic?) about setting it before a function is called that uses random numbers.

In theory, you can set the seed once at the top of the script, and you would be fine.

However, most people doing interactive data analysis are going to make changes to the script as they go. Modifications to the code will probably break the random number stream, and re-running the altered script would not give reproducible results.

For me, set it before you use random numbers.

I also run sample.int(10000, 5) to randomly generate seeds. Again, that might be more than you need.

system · April 28, 2023, 10:36am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.