Set seed Machine learning algorithms

Hello,
Can anyone clarify how is the best procedure to set.seed() before running a machine learning algorithms?
I have built a random forest model, a gbm model and a bart model.
Does every of them require a seed for reproducible results?
I have not split my dataset into train and test.
I have seen a lot of examples for random forest but I am not sure if this is required for BART and GBM as well.
An example of my models:

set.seed(500)
mod_BART <- bart(x.train = dataset[ , preds_selected], y.train = dataset[ , 1], keeptrees = TRUE)
summary(mod_BART)

set.seed(500)
formula_GBM <- as.formula(paste("presence ~", paste(preds_selected, collapse = "+")))
mod_GBM <- gbm(formula_GBM, data = dataset, distribution="bernoulli") 

Also how many times should I set the seed?
if the models are in the same script is it enough to set only 1 seed before the first model?
Thanks a lot
Angela

I am pretty thorough (neurotic?) about setting it before a function is called that uses random numbers.

In theory, you can set the seed once at the top of the script, and you would be fine.

However, most people doing interactive data analysis are going to make changes to the script as they go. Modifications to the code will probably break the random number stream, and re-running the altered script would not give reproducible results.

For me, set it before you use random numbers.

I also run sample.int(10000, 5) to randomly generate seeds. Again, that might be more than you need.

1 Like

Hello this is Gulshan Negi
Well, I searched about it on the internet, and I found that you can set the seed multiple times before each section of code with randomization to ensure reproducibility within that section. Keep in mind that setting the seed does not guarantee the same outcomes across platforms or software versions, but it does provide randomness that is consistent and easy to replicate within the same code and environment.
Thanks

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.