Hi,
First of all, I wanna mention that @mattwarkentin provided an excellent explanation on the topic.
Regarding your latest questions:
Every algorithm or piece of code that uses random numbers (which is pretty much all machine learning) will produce (slightly) different results every time you run it. If you set a seed, this does not mean that the same random number is generated over and over, but the order in which they appear are the same.
set.seed(1)
runif(5)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
runif(5)
[1] 0.89838968 0.94467527 0.66079779 0.62911404 0.06178627
If you run this code, you will get the exact same output. But note that the second round of random numbers is completely different from the first one. This means that every time after you set the seed random numbers are needed, they will be different, yet reproducible.
set.seed(1)
runif(10)
[1] 0.26550866 0.37212390 0.57285336 0.90820779 0.20168193 0.89838968 0.94467527 0.66079779 0.62911404 0.06178627
Here you see I now run all 10 at once, and the order of the random numbers is the same
Usually there should be no issue or bias with this, as long as you dataset is large and not sparse. There is no way of easily defining what dataset is large enough, but there is an easy way to find out if randomness will be important or not: Just set a seed (before splitting data), run all of your code and store the results. Then change the seed, run the code again and look at the results. If you do this a few times and you see the results are highly similar, the seed will not bias.
If the results are significantly different, your dataset it either too small or too sparse. In those cases splitting data create sets with a different distribution of inputs or outputs, which will influence performance depending on the distribution. Large or dense (not sparse) sets maintain the distribution of the data even when split.
Hope this helps,
PJ