merge two data frames - without creating new columns

Hello,

I seperated one data frame into two data frames and calculated "f1" and "f2" for both data frames seperately.

head(data1)

ID light lux time rdom f1 f2
1 11 26 19 12 2.1 34
4 11 26 19 12 22.9 31
6 11 26 19 12 6.6 7

head(data2)

ID light lux time rdom f1 f2
2 11 26 19 12 2.8 24
3 11 26 19 12 24.3 34
5 11 26 19 12 9.2 5

No I want to merge the dataframes back together so that there is a data set with all ID´s in it. I want to keep the columns "light", "lux", "time" & "rdom" only once, as they are the same in both dataframes. Also I want to sort the data by the ID. So basically I want the final data frame to look like that:

ID light lux time rdom f1 f2
1 11 26 19 12 2.1 34
2 11 26 19 12 2.8 24
3 11 26 19 12 24.3 34
4 11 26 19 12 22.9 31
5 11 26 19 12 9.2 5
6 11 26 19 12 6.6 7

In order to do so I used the function merge():
Data.final <- merge(data1, select(data2, -light, -lux, -time, -rdom), by = "ID", all.x=TRUE)

Following data set results:
ID light lux time rdom f1.x f2.x f1.y f2.y
1 11 26 19 12 2.1 34 NA NA
2 11 26 19 12 NA NA 2.8 24
3 11 26 19 12 NA NA 24.3 34
4 11 26 19 12 22.9 31 NA NA
5 11 26 19 12 NA NA 9.2 5
6 11 26 19 12 6.6 7 NA NA

Is there a way to avoid those extra columns (.x, .y) and NAs?

edit: Sorry for the relocated columns. I sadly don´t know how to fix that..

Thank you so much in advance!

Both data frames have the same columns and independent rows so you don't want to "merge" them, you want to "bind" them, take a look at this example:

library(dplyr)

# Sample data in a copy/paste friendly format, replace this with your own data frames
data1 <- data.frame(
          ID = c(1, 4, 6),
       light = c(11, 11, 11),
         lux = c(26, 26, 26),
        time = c(19, 19, 19),
        rdom = c(12, 12, 12),
          f1 = c(2.1, 22.9, 6.6),
          f2 = c(34, 31, 7)
)

data2 <- data.frame(
          ID = c(2, 3, 5),
       light = c(11, 11, 11),
         lux = c(26, 26, 26),
        time = c(19, 19, 19),
        rdom = c(12, 12, 12),
          f1 = c(2.8, 24.3, 9.2),
          f2 = c(24, 34, 5)
)

# Relevant code
data1 %>% 
    bind_rows(data2) %>% 
    arrange(ID)
#>   ID light lux time rdom   f1 f2
#> 1  1    11  26   19   12  2.1 34
#> 2  2    11  26   19   12  2.8 24
#> 3  3    11  26   19   12 24.3 34
#> 4  4    11  26   19   12 22.9 31
#> 5  5    11  26   19   12  9.2  5
#> 6  6    11  26   19   12  6.6  7

Created on 2022-08-03 by the reprex package (v2.0.1)

This function solves the problem with the columns but somehow doubles some of my IDs.
For example ID No. 12 is listed twice. One row contains the "normal values"/numbers, the other row contains only NA´s.
Do you know how to avoid those doubled IDs?

Thank you!

That is not what bind_rows() does, if you get duplicated IDs is because you actually have the same ID in both data frames.

If you need more specific help, please provide a proper REPRoducible EXample (reprex) that actually shows your issue.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.