splitting choice experiment (predict prob + assess accuracy)

What you are looking for is stratified sampling and there are many packages that have this capability.

For example, take a look at rsample package, function initial_split. This function has strata argument that you can specify in order to make sure that your training and test splits both have similar number of target variables as in original dataset (e.g., if you have 90% of class 0 and 10% of class 1 then both training and testing will roughly have 90% of class 0 and 10% of class 1).

Also, your error is not necessarily comes from incorrect splitting, but it's difficult to say otherwise without a reproducible example. Here is some info on how to create one:

1 Like