Creating all possible combinations of a data-frame.

I have a simple dataframe that contains six samples (nrows=6)
I am trying to make all possible combinations of the data-frame of size 4 samples (nrows=4)
The example data looks like this below

Samples=c("control1","control2","control3","control4","control5","control6")
dosage_mg<-c(0,0,0,0,0,0)

df=data.frame(Samples,dosage_mg)
df

#I want to find all possible combinations of this data frame but in size of 4 samples. I have tried this code below but it just repeats all the values in the sample.

expand.grid(Samples,dosage_mg)
d1 <- expand.grid(Samples = Samples, dosage_mg = dosage_mg)

#I tried the code below and it did not work.

The results should give a list of all possible combinations of dataframes of size(4 rows).
The resulting dataframes may look like the following


Samples=c("control4","control5","control6","control2")
dosage_mg<-c(0,0,0,0)

df_COMBINATION_1=data.frame(Samples,dosage_mg)

Samples=c("control3","control1","control6","control2")
dosage_mg<-c(0,0,0,0)

df_COMBINATION_2=data.frame(Samples,dosage_mg)

Samples=c("control1","control5","control2","control3")
dosage_mg<-c(0,0,0,0)

df_COMBINATION_3=data.frame(Samples,dosage_mg)

head(df_COMBINATION_1)
head(df_COMBINATION_2)
head(df_COMBINATION_3)

and the list goes on until all possible combinations are achieved.

Thank you in advance!

You can use combn to get all combinations of 4 different samples, then use that to iteratively create the new dataframes, e.g.:

# Get all possible combinations of 4 samples
sample_combinations <- as.list(combn(df$Samples, 4, simplify = FALSE))

# Create dataframes
combination_dfs <- lapply(sample_combinations,
                          function(x) data.frame(Samples = x, dosage_mg = rep(0,4)))
1 Like

Hi,
That is really helpful. My data frame also has some other columns which I want to put in the function that you wrote above.
The column names are
"Run" "Dose" "Concentration" "Time" "exposure" "treatment"

I tried to run the line of code with

sample_combinations <- as.list(combn(DESeqDesign$Run, 4, simplify = FALSE))

This line provides list of all possible samples based on final number 4 for each dataframe.
The line below provides me a nested list of dataframe according to the required no of samples but does not provide info on other column such as "Concentration" "Time" "exposure" and "treatment"

combination_dfs <- lapply(sample_combinations,
                          function(x) data.frame(Run = x, Dose = rep(0,4)))

Looking forward to your suggestions.

The solution I provided allows you to create new dataframes based on the combinations of samples (so for the Dose column I just set it to zero in the new dataframes).

If instead you wanted to extract a subset of the original table for each new dataframe, you can do the following:

# Subset original dataframe based on samples
combination_dfs <- lapply(sample_combinations,
                          function(x, data = df) data[data$Samples %in% x,])

Replace data$Samples with whatever column you used to create the combinations (e.g. Run).

Note that this assumes the column you use (here, Sample), has unique values. If that isn't the case your new dataframes may have more rows than you're expecting.

1 Like

Thank you @cnbrownlie !
That was really helpful and solved my issue.

Regards

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.