Merge data frames in a loop

RSola · July 10, 2019, 1:22pm

I have "n" data frames to merge. I would like to create a process to do it automatically.

I 've tried to do in a loop:

df_merge<-first_dataf

###data frames calls df_2, df_3.....df_n

for (i in 2:n){
next_df<- (paste0("df_",i,sep="")
df_merge <- rbind(df_merge,next_df)
}

The problem si that next_df is a character and I need that be a data frames that i have loaded in R.

Thank you!!

pieterjanvc · July 10, 2019, 1:58pm

Use the eval function to make R evaluate the name of the data frame as the real data frame:

next_df <- eval(parse(text=paste("df_", i, sep="")))

Make it even easier by not using the loop at all:

df_merge <- eval(parse(text=paste("rbind(", paste("df_", 1:n, sep = "", collapse = ", "), ")")))

Yarnabrina · July 10, 2019, 3:33pm

Along with the above solution, another possibility will be to use Reduce, assuming the names as df_1, df_2, ..., df_n:

Reduce(f = function(t1, t2) {rbind(t1, eval(expr = parse(text = t2)))},
       x = paste("df", 1:n, sep = "_"),
       init = data.frame())

nathania, seeing your answer below (), can I ask you a question?

Suppose you have data frames (or tibbles) named df_1, ..., df_n, as per the original question. Then, will this method merge the data frames consecutively, i.e. how do you ensure that merging will start with df_1, then df_2, then df_3, and so on?

And, a note: probably ls(pattern = "bad") does the same thing as names(.GlobalEnv) %>% str_subset('bad'). The pattern in the ordering is more understandable, as it is sorted by default.

nathania · July 10, 2019, 11:57pm

Hi! This type of approach might be overkill for your use-case, but it's helped a few students turn in their problem sets on time.

library(tidyverse)

# dummy data
bad_name_1 <- starwars
bad_nm2 <- starwars
another_bad_name <- starwars
bad_number <- 1234
names_r_hard <- 1:10

bad_dfs <- names(.GlobalEnv) %>% 
  str_subset('bad') %>% 
  map_chr(~ if_else('tbl_df' %in% class(pluck(.GlobalEnv, .x)), .x, NA_character_)) %>% 
  map_dfr(~ pluck(.GlobalEnv, .x))

nrow(bad_dfs) == (3 * nrow(starwars))
#> [1] TRUE

^{Created on 2019-07-10 by the reprex package (v0.3.0.9000)}

nathania · July 11, 2019, 1:35pm

That's an interesting question! I'm not sure how names(.GlobalEnv) is ordered. If the original grouping of observations is meaningful, you could modify the last lambda function to ~ pluck(.GlobalEnv, .x) %>% add_column(src_id = .x). As you suggested, ls(pattern = 'bad') is also a good option when objects have been named consistently.