Loop for combining different datasets performing all the possible combinations

I have a bunch of datasets (the number is not defined) with the same column names and identified by the factor reported in the column "factor". I'd like to obtain all the multiple combinations of them. Here is reported what I'd like to obtain (for the example of 3 different datasets):

frame_x = data.frame(a=c(12,10,3), b=c(6,4,2), c=c(3,62,3),factor=c("x","x","x"))
frame_y = data.frame(a=c(2,13,34), b=c(22,13,36), c=c(22,13,34),factor=c("y","y","y"))
frame_z = data.frame(a=c(36,28,11), b=c(32,24,16), c=c(33,22,17),factor=c("z","z","z"))

frame_x_new = rbind(frame_x,frame_y,frame_z)
frame_x_new$factor = c("x","x","x","other","other","other","other","other","other")
frame_y_new = rbind(frame_y,frame_x,frame_z)
frame_y_new$factor = c("y","y","y","other","other","other","other","other","other")
frame_z_new = rbind(frame_z,frame_x,frame_y)
frame_z_new$factor = c("z","z","z","other","other","other","other","other","other")

frame_x<-frame_x_new
frame_y<-frame_y_new
frame_z<-frame_z_new

Here's what I'd like to obtain if I have 3 datasets, performing the combinations of x vs y+z, y vs x+z and z vs x+y combinations.
I'd like to have a loop doing this since I might have also a higher (eg. 4 datasets, where I'd like to have x vs y+z+w, etc) or a lower (e.g. 2 datasets, x vs y) number of data frames.
Thank you for your help.

Hi,

Here is one implementation:

options(stringsAsFactors = F) #ensure correct merging of data frames

#Example data frames
frame_x = data.frame(a=c(12,10,3), b=c(6,4,2), c=c(3,62,3),factor=c("x","x","x"))
frame_y = data.frame(a=c(2,13,34), b=c(22,13,36), c=c(22,13,34),factor=c("y","y","y"))
frame_z = data.frame(a=c(36,28,11), b=c(32,24,16), c=c(33,22,17),factor=c("z","z","z"))

#You have to specify here which data frames you like to combine
allDf = list(frame_x,frame_y, frame_z)

nDf = length(allDf)

#Merge the data frames, every time different one first
allComparisons = lapply(1:nDf, function(x){
  #Merge all df, but change the order of the first one in the list
  mergeDf = do.call(rbind, allDf[c(x, c(1:nDf)[-x])])
  #Set the new factor names, only keeing the names for the first one
  mergeDf$factor = c(allDf[[x]]$factor, rep("other", nrow(allDf[[x]]) * (nDf - 1)))
  mergeDf
})

#Check result:
allComparisons[1]
[[1]]
   a  b  c factor
1 12  6  3      x
2 10  4 62      x
3  3  2  3      x
4  2 22 22  other
5 13 13 13  other
6 34 36 34  other
7 36 32 33  other
8 28 24 22  other
9 11 16 17  other

allComparisons[2]
[[1]]
   a  b  c factor
1  2 22 22      y
2 13 13 13      y
3 34 36 34      y
4 12  6  3  other
5 10  4 62  other
6  3  2  3  other
7 36 32 33  other
8 28 24 22  other
9 11 16 17  other

I'm sure there also is a Tidyverse implementation of this, but this one was the one that I could think of first. Note that I used lapply to avoid having to use a loop. Also, the do.call function is a way of calling an operation over multiple object in a list, so that's how I could do the rbind over all data frames.

In your real code, there are better ways to assign all data frames to the list allDf, but this depends on how those data is generated (loaded from different files or created in previous code).

Hope this helps,
PJ

1 Like

Dear @pieterjanvc, thank you so much for your help! It completely solved the issue I was dealing with.
Thanks a lot!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.