your point 2 and 3 seem to be the same content with different words... or is there a relevant difference ?
Some questions for you ...
What assumptions of regression analysis are you referring to? How do you know that those assumptions are violated by the database in question ?
how might a 'suitable random generator' address those defects ?