Trying to use R to randomly split a population into two groups but I want an even distribution of age and sex in both groups - its been said to use a counterbalance method but I am uncertain. Can anyone help please?
The implication is that when you naively try to randomly split your data, your resulting two subpopulations have bias and that neither represent the larger population they are both drawn from ? I'm sceptical about that, but its possible there are things undescribed in your issue that would explain that.
Practically speaking ; I think the next tool in the toolkit would be stratified sampling. sex is trivial, its a category of 2. age, you could perhaps band , off the top of my head, maybe into 3rds, and use that banded age as a stratifier.
Warning: the following may be overkill.
If your age variable is discrete (say, integer number of years), you can use an optimization model. Assign observations to clusters based on age and gender (e.g., 29 year old females). Use an integer linear programming model to decide how many members of each cluster to assign to each of your two groups, with the objective of balancing age and gender. Armed with a solution, assign individuals randomly where possible. If a cluster contains n observations with m<n to be assigned to group 1, select those m randomly.
This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.