How to merge multiple dataframes in R using loop?

Hi,

I have multiple .txt files (each file contains 2 columns; an identifier Gene column and a sample column). I would like to merge the dataframes using the Gene column. I was able to import multiple .txt files together, after importing, I am not been able to merge them using the loop in R. Please assist me regarding the same.

I usually merge the dataframe by using the

library(tidyverse)
Merge_All_Samples <- list(df1, df2, df3) %>% reduce(inner_join, by = "Gene")

Trying the below code to merge the dataframe/list, it does not work.

library(tidyverse)

all_files <- dir("/Users/Documents/Test/")
file_names <- grep(all_files,pattern = "^G.*.txt$",value = TRUE)

for(i in 1:length(file_names)){
  Data_file <- read.delim(file = paste0("./",file_names[i]), stringsAsFactors = FALSE, check.names = FALSE, row.names = NULL)
  
  ##Merge all samples data

  Merge_All_Samples <- list(Data_file) %>% reduce(inner_join, by = "Gene")
  colnames(Merge_All_Samples)
  
  ##Export the merged data###
  write.csv(Merge_All_Samples, "./Merge_All_Samples.csv", row.names = F)
}

Created on 2021-09-17 by the reprex package (v2.0.1)

dput(GSM1)
structure(list(Gene = c("A1BG", "A1BG-AS1", "A1CF", "A2M", "A2M-AS1"
), GSM1 = c(4L, 52L, 12L, 645L, 113L)), class = "data.frame", row.names = c(NA, 
                                                                            -5L))
#>       Gene GSM1
#> 1     A1BG    4
#> 2 A1BG-AS1   52
#> 3     A1CF   12
#> 4      A2M  645
#> 5  A2M-AS1  113

dput(GSM2)
structure(list(Gene = c("A1BG", "A1BG-AS1", "A1CF", "A2M", "A2M-AS1"
), GSM2 = c(4L, 57L, 10L, 638L, 161L)), class = "data.frame", row.names = c(NA, 
                                                                            -5L))
#>       Gene GSM2
#> 1     A1BG    4
#> 2 A1BG-AS1   57
#> 3     A1CF   10
#> 4      A2M  638
#> 5  A2M-AS1  161

dput(GSM3)
structure(list(Gene = c("A1BG", "A1BG-AS1", "A1CF", "A2M", "A2M-AS1"
), GSM3 = c(4L, 57L, 10L, 638L, 161L)), class = "data.frame", row.names = c(NA, 
                                                                            -5L))
#>       Gene GSM3
#> 1     A1BG    4
#> 2 A1BG-AS1   57
#> 3     A1CF   10
#> 4      A2M  638
#> 5  A2M-AS1  161

Created on 2021-09-17 by the reprex package (v2.0.1)

Thank you,
Toufiq

Hi @mtoufiq,

I don't think a for loop is needed. I saved each of the three data sets to disk as CSVs and read them in a merged as follows:

library(tidyverse)
library(fs)

files <- dir_ls("~/Desktop/", regexp = "GSM\\d+.csv")  
files
#> /Users/Matt/Desktop/GSM1.csv /Users/Matt/Desktop/GSM2.csv 
#> /Users/Matt/Desktop/GSM3.csv

map(files, read_csv) %>% 
  reduce(inner_join, by = "Gene")
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   Gene = col_character(),
#>   GSM1 = col_double()
#> )
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   Gene = col_character(),
#>   GSM2 = col_double()
#> )
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   Gene = col_character(),
#>   GSM3 = col_double()
#> )
#> # A tibble: 5 x 4
#>   Gene      GSM1  GSM2  GSM3
#>   <chr>    <dbl> <dbl> <dbl>
#> 1 A1BG         4     4     4
#> 2 A1BG-AS1    52    57    57
#> 3 A1CF        12    10    10
#> 4 A2M        645   638   638
#> 5 A2M-AS1    113   161   161
1 Like

You do not have to use a for loop at all. Very often in R there is a function to do what you might otherwise do with a loop.

library(tidyverse)

all_files <- dir("/Users/Documents/Test/")
file_names <- grep(all_files,pattern = "^G.*.txt$",value = TRUE)
Data_file <- map(file_names,read.delim, stringsAsFactors = FALSE, check.names = FALSE, row.names = NULL)
Merge_All_Samples <- Data_file %>% reduce(inner_join, by = "Gene")
write.csv(Merge_All_Samples, "./Merge_All_Samples.csv", row.names = F)

If you do want to use a loop, define Data_file to be a list of the correct length beforehand and then fill its elements in the loop with all of the individual files. Outside of the loop, use reduce to merge that list into a single data frame.

@FJCC thank you very much. This worked.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.