Merging multiple data frames by specific column names

adikii · April 22, 2021, 8:40am

Hi everybody!

I have 4 data frames that I want to merge to 1 big data frame.
They all have the same structure and same amount of columns, but some column names are a bit different, for example:
data frame 1- column name 1="dollar"
data frame 2- column name 1="Dollar"

I tried to use: colnames(df1)=colnames(df2), but each time I run the code it choses the names arbitrarily, once from data frame 1 and the next time from data frame 2.

Is there a way to merge the data frames and specify from which of the 4 data frames I want to take the column names?

Thanks!!

Dobrokhotov1989 · April 22, 2021, 10:01am

Could you provide a reprex of this behavior?
I've tried it and it works just fine (the same way every time):

df1 <- data.frame(col1 = c(1, 2, 3),
                  col2 = c(2, 2, 2),
                  col3 = c("A", "B", "C"))

df2 <- data.frame(Co1 = c(5, 6, 7),
                  Co2 = c(7, 3, 4),
                  Co3 = c("a", "l", "p"))

colnames(df1)
#> [1] "col1" "col2" "col3"

colnames(df1) = colnames(df2)

colnames(df1)
#> [1] "Co1" "Co2" "Co3"

Leon · April 22, 2021, 11:50am

Try playing a bit around with this joining example:

library("tidyverse")
set.seed(89718)
df_A <- data.frame(
  dollar = sample(1:10, 5),
  val_A = runif(5)
)
df_B <- data.frame(
  Dollar = sample(1:10, 5),
  val_B = runif(5)
)
df_C <- data.frame(
  dollaR = sample(1:10, 5),
  val_C = runif(5)
)
df_D <- data.frame(
  doLlar = sample(1:10, 5),
  val_D = runif(5)
)
df_A %>%
  full_join(df_B,
            by = c("dollar" = "Dollar")) %>% 
  full_join(df_C,
            by = c("dollar" = "dollaR")) %>% 
  full_join(df_D,
            by = c("dollar" = "doLlar"))

Hope it helps!

jlacko · April 22, 2021, 3:28pm

If the only difference is in case you may consider standardizing the colnames() to lowercase (or uppercase, for all that matters) - and then join as required, without having to bother.

Consider this code, using the built in iris dataset (which follows a cruel and unusual naming convention).

head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa 

colnames(iris) <- tolower(colnames(iris))
head(iris)

  sepal.length sepal.width petal.length petal.width species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

adikii · April 22, 2021, 8:03pm

Hi,
Thanks!!

Can you please explain the meaning of sample (1:10, 5) and runif(5)? It mean to use the 5th column?

Also, how do I need to change the code if I need to do this to more columns?

Thanks again!!

Leon · April 23, 2021, 5:44am

Reg. the code snippets you a ask about, run this in your console: ?sample, ?runif and then the actual code, i.e. sample(1:10, 5) and runif(5). For a better understanding of relational data, read this chapter: https://r4ds.had.co.nz/relational-data.html

system · May 14, 2021, 5:44am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.