set.seed() in context of tidymodels

John_Rambo · March 16, 2022, 5:33pm

I know we have to set seed every time we want to get a reproducible random result. But what is the reason, to set seed in regard to tuning models like here?
Where is the random part and for what part is set.seed() necessary?

nirgrahamuk · March 16, 2022, 5:50pm

the first encounter with randomness in the text you linked to relates to test/train data splitting.

John_Rambo · March 16, 2022, 5:54pm

I know. But why and where in detail?

Max · March 21, 2022, 4:48pm

You would need to know about the computations that are being done.

For example, you would set the seed before initial_split() since it uses random numbers.

You might also use it before calling one of the tune_*() functions (or similar) if

the model uses random numbers (like random forests)
you are using the grid function to have tidymodels make a tuning parameter grid for you

and so on. It really depends.

If you are never going to change your script, you can se the seed at the top. This is a pretty bad assumption, so we often set it multiple times in a script.

system · April 11, 2022, 4:48pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.