# Making Combinations of Items

Suppose I have the following lists of factor:

``````factor_1 = c("A1", "A2", "A3")
factor_2 = c("B1", "B2")
factor_3 = c("C1", "C2", "C3", "C4")
factor_4 = c("D1", "D2", "D3")
``````

I made the following data frame that contains all (3 * 2 * 4 * 3 = ) 72 combinations of these factors:

``````data_exp <- expand.grid(factor_1, factor_2, factor_3, factor_4)
data_exp\$id = 1:nrow(data_exp)

Var1 Var2 Var3 Var4 id
1   A1   B1   C1   D1  1
2   A2   B1   C1   D1  2
3   A3   B1   C1   D1  3
4   A1   B2   C1   D1  4
5   A2   B2   C1   D1  5
6   A3   B2   C1   D1  6
``````

I want to randomly split this data (data_exp) into 3 datasets such that each row only appears in one of these datasets - furthermore, these 3 datasets do not have to be the same size. I tried to do this with the following code.

First, I randomly generate 3 random numbers corresponding to the number of rows for each of these datasets, such that these 3 random numbers add to 72:

``````# https://stackoverflow.com/questions/24845909/generate-n-random-integers-that-sum-to-m-in-r

rand_vect <- function(N, M, sd = 1, pos.only = TRUE) {
vec <- rnorm(N, M/N, sd)
if (abs(sum(vec)) < 0.01) vec <- vec + 1
vec <- round(vec / sum(vec) * M)
deviation <- M - sum(vec)
for (. in seq_len(abs(deviation))) {
vec[i] <- vec[i <- sample(N, 1)] + sign(deviation)
}
if (pos.only) while (any(vec < 0)) {
negs <- vec < 0
pos  <- vec > 0
vec[negs][i] <- vec[negs][i <- sample(sum(negs), 1)] + 1
vec[pos][i]  <- vec[pos ][i <- sample(sum(pos ), 1)] - 1
}
vec
}

r = rand_vect(3, 72)
 26 23 23
``````

Next, I tried to create these datasets using these random numbers:

``````data_1 = data_exp[sample(nrow(data_exp), r), ]
data_2 = data_exp[sample(nrow(data_exp), r), ]
data_3 = [sample(nrow(data_exp), r), ]
``````
• The problem with this approach is that `data_1, data_2, data_3` have common rows, and not all the rows from data_exp are necessarily present within `data_1, data_2, data_3` .

Is there a way to fix this problem?

Thank you!

Hope this is of some use to you

factor_1 = c("A1", "A2", "A3")
factor_2 = c("B1", "B2")
factor_3 = c("C1", "C2", "C3", "C4")
factor_4 = c("D1", "D2", "D3")

data_exp <- expand.grid(factor_1, factor_2, factor_3, factor_4)
data_exp\$id = 1:nrow(data_exp)

set.seed(1234)
idx <- sample(3, size = nrow(data_exp), replace = TRUE, prob = c(0.33, 0.33,0.34))
df1 <- data_exp[idx == 1,]
df2 <- data_exp[idx == 2,]
df3 <- data_exp[idx == 3,]

Thank you so much! Ideally I would like the number of tows in df1, df2, and df3 to be fully random and still add up to nrow(data_exp) ...is this possible? Thank you so much!

Hi
I checked before posting, it is random and rows in all three add up to the no. of rows in original dataset

