Set.seed() with simulations

rmarkdown
rstudio

#1

Hello everyone,

I am running some (big) simulations in R and of course i would like to save the results each time i run them for reproducibility but also for examining the results. Actually, i work in Rmarkdown, but this i think doesn't play a role since it is the same for R scripts in general.

More precisely, i am running a simulation on mixed models and i want to do 2 steps:

  1. Run 500 simulated datasets, do whatever i want to do with them,
  2. Re-run exactly the same simulation but with 1000 datasets. BUT, i want the first 500 to be exacly the same as in step 1.

And indeed, by setting the seed i managed to do that! In step 2, the first 500 are identical to the those of step 1.

However, i would expect that in step 2 the required time would be equal to that of step 1. That is, restore the 500 simulated datasets already ran in step 1 and then run another 500. But to my surprise, it took exactly double time to carry out step 2, as it was another 1000 simulated datasets from scratch....

Do you know why does this happen ? Is it normal to re-run the simulations although i have set the seed ? Are they not saved by that way ?

I hope i explain my situaton well! If not, please ask me for further explanation...

Thank you,
John


#2

It would be helpful to see a minimal reproducible example that demonstrates this.

But from what you've described, I think the thing that is taking time in your example are the simulation runs, not the generation of random numbers.

All set.seed does is set the initial seed for R's pseudo-random number generators. Presumably then, your simulation does a bunch of others things that take time to compute (the "do whatever i want to do with them," part of your steps).

Again, if you can provide a minimal reprex I think it'll be super helpful to getting to the bottom of this


#3

As @EconomiCurtis mentioned, reprex would be helpful.

In the meantime, I want to ask you what do you think happens when you set a seed? In other words, why do you expect that datasets should be saved? From the ?set.seed page:

.Random.seed is an integer vector, containing the random number generator (RNG)
state for random number generation in R. It can be saved and restored, 
but should not be altered by the user.

One thing that you might be able to do to save yourself some time is memoization. It'll cache result of a simulation and use it second time if input is exactly the same. However, it is difficult to say whether in your case it'll work without looking at code.


#4

Thank you both for your replies!

@mishabalyasin I believe your question is really what i was looking for... By using Set.seed() the only thing that is being done is saving the way the data is randomly generated. So, by this way i get the same results each time.

So, if i do the 2 steps with set.seed() and also cache the results in my Rmarkdown file, i will have exactly what i need!

Thanks again for your feedback! It was really helpfull :smile:

Thanks,
John