I want to divide my data set into train and test data. but I have one column as a group.All member of a group must be in train or test. for example if the group column is like this:
group
1
1
1
1
1
2
2
2
3
3
if one of the row of first group is in train set the first 5 rows must be in there and ...
I think the easiest approach is to construct the test and training populations by sampling the group column. Let's say your data are in a data frame named DF, there are ten groups labeled 1 - 10 and you want the training sample to be 7 of the groups.