# Loop for combining different datasets performing all the possible combinations

I have a bunch of datasets (the number is not defined) with the same column names and identified by the factor reported in the column "factor". I'd like to obtain all the multiple combinations of them. Here is reported what I'd like to obtain (for the example of 3 different datasets):

``````frame_x = data.frame(a=c(12,10,3), b=c(6,4,2), c=c(3,62,3),factor=c("x","x","x"))
frame_y = data.frame(a=c(2,13,34), b=c(22,13,36), c=c(22,13,34),factor=c("y","y","y"))
frame_z = data.frame(a=c(36,28,11), b=c(32,24,16), c=c(33,22,17),factor=c("z","z","z"))

frame_x_new = rbind(frame_x,frame_y,frame_z)
frame_x_new\$factor = c("x","x","x","other","other","other","other","other","other")
frame_y_new = rbind(frame_y,frame_x,frame_z)
frame_y_new\$factor = c("y","y","y","other","other","other","other","other","other")
frame_z_new = rbind(frame_z,frame_x,frame_y)
frame_z_new\$factor = c("z","z","z","other","other","other","other","other","other")

frame_x<-frame_x_new
frame_y<-frame_y_new
frame_z<-frame_z_new
``````

Here's what I'd like to obtain if I have 3 datasets, performing the combinations of x vs y+z, y vs x+z and z vs x+y combinations.
I'd like to have a loop doing this since I might have also a higher (eg. 4 datasets, where I'd like to have x vs y+z+w, etc) or a lower (e.g. 2 datasets, x vs y) number of data frames.

Hi,

Here is one implementation:

``````options(stringsAsFactors = F) #ensure correct merging of data frames

#Example data frames
frame_x = data.frame(a=c(12,10,3), b=c(6,4,2), c=c(3,62,3),factor=c("x","x","x"))
frame_y = data.frame(a=c(2,13,34), b=c(22,13,36), c=c(22,13,34),factor=c("y","y","y"))
frame_z = data.frame(a=c(36,28,11), b=c(32,24,16), c=c(33,22,17),factor=c("z","z","z"))

#You have to specify here which data frames you like to combine
allDf = list(frame_x,frame_y, frame_z)

nDf = length(allDf)

#Merge the data frames, every time different one first
allComparisons = lapply(1:nDf, function(x){
#Merge all df, but change the order of the first one in the list
mergeDf = do.call(rbind, allDf[c(x, c(1:nDf)[-x])])
#Set the new factor names, only keeing the names for the first one
mergeDf\$factor = c(allDf[[x]]\$factor, rep("other", nrow(allDf[[x]]) * (nDf - 1)))
mergeDf
})

#Check result:
allComparisons
[]
a  b  c factor
1 12  6  3      x
2 10  4 62      x
3  3  2  3      x
4  2 22 22  other
5 13 13 13  other
6 34 36 34  other
7 36 32 33  other
8 28 24 22  other
9 11 16 17  other

allComparisons
[]
a  b  c factor
1  2 22 22      y
2 13 13 13      y
3 34 36 34      y
4 12  6  3  other
5 10  4 62  other
6  3  2  3  other
7 36 32 33  other
8 28 24 22  other
9 11 16 17  other

``````

I'm sure there also is a Tidyverse implementation of this, but this one was the one that I could think of first. Note that I used `lapply` to avoid having to use a loop. Also, the `do.call` function is a way of calling an operation over multiple object in a list, so that's how I could do the `rbind` over all data frames.

In your real code, there are better ways to assign all data frames to the list allDf, but this depends on how those data is generated (loaded from different files or created in previous code).

Hope this helps,
PJ

1 Like

Dear @pieterjanvc, thank you so much for your help! It completely solved the issue I was dealing with.
Thanks a lot!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.