Using bootstrapping method to create multiple data-frames with replacement method


I am working on a datasets and after some discussion with my group, we doubt that maybe one or more of our controls are different than the other controls. The motivation is to see if one or more controls have been effected differently by the solvent they were kept in.

I have been suggested to use bootstrap method. If we suppose that I have a dataset with 5 controls and 3 treated samples. I want to create 5 new dataframes with the information such as every new data frame skips one out of 5 controls and do resampling with replacement. I want to see how stable the controls are and by skipping a specific control, would the DEGs result change.

Let us suppose that the original data frame is like this:

x <- round(matrix(rexp(480 * 10, rate=.1), ncol=8), 0)

rownames(x) <- paste("gene", 1:nrow(x))

colnames(x) <-c("control1","control2","control3","control4","control5","treated","treated","treated")

I want to create 5 new dataframes (as there are 5 controls in this study) where each data frame skips one specific control and replace with with another control (which means some other control will repeat).

For example one of the 5 data frame can look like:

x1 <- round(matrix(rexp(480 * 10, rate=.1), ncol=8), 0)

rownames(x1) <- paste("gene", 1:nrow(x1))

colnames(x1) <-c("control1","control1.1","control3","control4","control5","treated","treated","treated")


You can see that this new data frame skipped control2 with a copy of control1 called control1.1.
The motivation is to look how stable the controls are and if there is one specific control that is affecting the results when Differential gene expression was done using DESeq2.

Thank you!

this takes a data.frame and makes 5 versions which are identical but for that each one had a different 'control' column renamed to 'ignoreme'

x <- round(matrix(rexp(480 * 10, rate=.1), ncol=8), 0)

rownames(x) <- paste("gene", 1:nrow(x))

colnames(x) <-c("control1","control2","control3","control4","control5","treated","treated","treated")

(controls <- colnames(x)[startsWith(colnames(x),"cont")])

(frames_5 <- lapply(controls, function(n){
  colnames(x)[which(colnames(x)==n)] <- "ignoreme"

Thanks, That is really helpful. I do believe I can convert the nested lists into individual dataframes.
But is it possible that the ignoreme column can contain one duplicate out of the four existing columns as the motivation is to remove one column of controls and replace it with one of the existing control columns (which could be any)?

Thank you again!

what does it mean contain a duplicate out of the four existing columns) , you have a set of 5 columns, I thought this correlated with the 5 frames you ask for.
In the example code your provided, theres first a dataframe with control columns 1 to 5 , in the second which you showed as an example , it seems all you did was rename control2 to control1.1 ; whereas i renamed it to ignoreme. what should be duplicated ? in each case of the 5, what should be done ?

So now we do have 5 versions from the main data-frame. Sorry about the confusion.
I would like to replace the ignoreme column with any other existing controls column. For example, if there are 5 versions,
Version one will have:
control1, control2,control3,control4 and control1.1(which is a duplicate of control 1) AND all treatments

Version two will have:
control1, control2,control3,control5 and control3.1(which is a duplicate of control 3) AND all treatments
and so on. The duplicate control could be any control except the one that was removed.

I hope it makes some sense :slight_smile:

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.