How to merge (by rows) more than two datasets with different columns

I've prepared the following simple Shiny app aiming to import multiple datasets with that contain different columns. I'd like to merge them using a loop since I could upload more than two datasets and I'd like to discard the unmatching columns.
The data I'd like to upload, as an example, are the following:

data_A <- data.frame(
      Sample = as.factor(c("x_1", "x_2", "x_3")),
      Pop = as.factor(c("FF", "FF", "FF")),
         aaa = c(31L, 27L, 17L),
       aaa.1 = c(3L, 7L, 19L),
         bbb = c(10L, 31L, 37L),
       bbb.1 = c(23L, 1L, 13L),
         ccc = c(20L, 18L, 37L),
       ccc.1 = c(9L, 12L, 10L),
         ddd = c(3L, 6L, 28L),
       ddd.1 = c(32L, 27L, 27L),
         eee = c(34L, 33L, 3L),
       eee.1 = c(38L, 13L, 17L),
         fff = c(21L, 26L, 40L),
       fff.1 = c(38L, 4L, 28L)
)

data_B <- data.frame(
      Sample = as.factor(c("x_4", "X_5", "X_6")),
      Pop = as.factor(c("GG", "GG", "GG")),
         aaa = c(21L, 24L, 24L),
       aaa.1 = c(9L, 1L, 36L),
         ccc = c(3L, 7L, 30L),
       ccc.1 = c(37L, 13L, 23L),
         fff = c(6L, 6L, 3L),
       fff.1 = c(11L, 3L, 30L)
)

data_C <- data.frame(
        Sample = as.factor(c("x_7", "x_8", "x_9")),
        Pop = as.factor(c("ZZ", "ZZ", "ZZ")),
         aaa = c(31L, 27L, 17L),
       aaa.1 = c(3L, 7L, 19L),
         bbb = c(10L, 31L, 37L),
       bbb.1 = c(23L, 1L, 13L),
         ccc = c(20L, 18L, 37L),
       ccc.1 = c(9L, 12L, 10L),
         fff = c(21L, 26L, 40L),
       fff.1 = c(38L, 4L, 28L)
)

and I'd like to obtain the following final dataset:

data_def <- data.frame(
        Sample = as.factor(c("x_1", "x_2", "x_3", "x_4", "X_5", "X_6", "x_7","x_8", "x_9")),
      Pop = as.factor(c("FF", "FF", "FF", "GG", "GG", "GG", "ZZ", "ZZ","ZZ")),
         aaa = c(31L, 27L, 17L, 21L, 24L, 24L, 31L, 27L, 17L),
       aaa.1 = c(3L, 7L, 19L, 9L, 1L, 36L, 3L, 7L, 19L),
         ccc = c(20L, 18L, 37L, 3L, 7L, 30L, 20L, 18L, 37L),
       ccc.1 = c(9L, 12L, 10L, 37L, 13L, 23L, 9L, 12L, 10L),
         fff = c(21L, 26L, 40L, 6L, 6L, 3L, 21L, 26L, 40L),
       fff.1 = c(38L, 4L, 28L, 11L, 3L, 30L, 38L, 4L, 28L)
)

So, I prepared the following function and I tried to use intersect function, but I noticed it works only in case of two datasets, so I'd like to use it in a loop since I'd need to merge event more than two datasets. Here I report an example with 3 datasets, but I'd like to extend it to a N number of uploaded datasets:

datasets <- list(data_A,data_B,data_C)
numfiles = length(datasets)
mydata = datasets
variables = list()
for (i in 1:numfiles)
{
  variables[[i]] = c(colnames(mydata[[i]]))
}
InAll = lapply(variables, FUN = intersect)
do.call(rbind, mydata[,inAll])

I tried different solutions but I could not achieve my goal. I'd be very grateful if anybody could help me with this matter!

Hello. I understand you ultimately want to add this functionality to a shiny app. But there is a lot of 'overhead' in considering your code example with uneccesary shiny parts that are not a fundamental part of your issue.

I.e. if you can transform the datasets in a script then you can also in a shiny app.

Please consider revising your example so that it's more minimal. You are more likely to get useful help this way :slight_smile:

Dear @nirgrahamuk, thank you very much for your advice! I edited the post, I hope now it is more minimal and clear!

Thanks
Here is a possible solution for you to consider. Hope it helps.
(it assumes your data_* from above)

library(tidyverse)
mydata <- list(data_A,data_B,data_C)
variables <- map(mydata,names)

##collates all the common names
InAll <-Reduce(f = intersect,x = variables)
#reform the data list to dataframes with only the column names above
my_selected_data <- map(mydata,~select(.,all_of(InAll)))
#stack them all vertically
final <-Reduce(f=union_all,x = my_selected_data)
#the target has strings as factors, so remake this
final_with_fact <- mutate_if(final,
                             is.character,as_factor)
#test achieved target
all.equal(data_def,final_with_fact)
# [1] TRUE

note: the above uses base::Reduce, but purrr::reduce is also available in tidyverse. That can use the pipes.


library(tidyverse)
mydata <- list(data_A,data_B,data_C)
variables <- map(mydata,names)

##collates all the common names
InAll <- variables %>% reduce(intersect)
#reform the data list to column with only the column names
my_selected_data <- map(mydata,~select(.,all_of(InAll)))
#stack them all vertically
final <- my_selected_data %>% reduce(union_all)
#the target has strings as factors, so remake this
final_with_fact <- mutate_if(final,
                             is.character,as_factor)
#test achieved target
all.equal(data_def,final_with_fact)
# [1] TRUE
1 Like

Dear @nirgrahamuk, thank you for you help and assistance.
I tested the code you prepared but I obtained the following error:

Error in all_of(InAll) : could not find function "all_of"

I obtained it even writing library(tidyselect), could you fix this issues, too?

I have version 1.0.0, which do you have ?

packageVersion("tidyselect")
[1] ‘1.0.0’

p.s. the all_of() function can be removed and the code should work, it just removes a tidyverse warning about ambiguity... but I think if you install the latest version, you will have this function :slight_smile:

Dear @nirgrahamuk,
now it works, I updated the tidyselect version of the package I had.
Thank you very much for your constant help and sorry for my continue silly questions!

I'm here to become a better R programmer, and your question was interesting to solve.
Thanks for asking it :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.