Merging Data Frames

Good morning,

I would like to ask you a question regarding the ability to merge different data frames. My problem is that I have two data frames I want to merge but the system does not allow me to do so.

These are the Data Frames

Name Sex ALAT ASAT
Juan M 769 79
Juan M 879 980

Name Sex ALAT
Maria F 8089
Maria F 890

As you may see the second data frame does not have the ASAT values but I would still love merging these two data frames in order to have this result.

Name Sex ALAT ASAT
Juan M 769 79
Juan M 879 980
Maria F 8089 NA
Maria F 890 NA

Please is there any possibility to do this? Thank you for your help

Please see the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers.

Without one, I can only offer that it may be possible to simply rbind and deal with the NAs later

Thank you I will do this

1 Like

Hello

d1 <- data.frame(Name=rep("Juan",2),SEX = rep("M",2),ALAT=c(769,879),ASAT = c(79,980),stringsAsFactors = FALSE)

d2 <- data.frame(Name=rep("Maria",2),SEX = rep("F",2),ALAT=c(8089,890),stringsAsFactors = FALSE)

bind_rows(d1,d2) should provide you with the desired answer

Good morning,

I would like to ask you a question regarding the ability to merge different data frames. My problem is that I have two data frames I want to merge but the system does not allow me to do so.

I would like to merge both data frames and putting NA in the fields that are missing. Is it possible?

library(readr)

Dataset1 <- read_csv("C:/Users/juanp/Desktop/CORONA/EXP1.csv")
#> Parsed with column specification:
#> cols(
#>   Name = col_character(),
#>   Sex = col_character(),
#>   ALAT = col_double(),
#>   ASAT = col_double(),
#>   Bilirrubine = col_double(),
#>   Alpha = col_double()
#> )
View(Dataset1)

Dataset2 <- read_csv("C:/Users/juanp/Desktop/CORONA/EXP2.csv", 
                     col_types = cols(Sex = col_character()))
View(Dataset2)

Created on 2020-03-19 by the reprex package (v0.3.0)

I'm not sure what you mean by

but from your description, it sounds looks like you want to do a full_join(). Here's an illustration using a toy data set.

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3

df1 <- tribble(~Fruit, ~Qty1,
               "Apple", 35,
               "Banana", 10,
               "Orange", 20)

df2 <- tribble(~Fruit, ~Qty2,
               "Apple", 15,
               "Banana", 20,
               "Pear", 40)

full_join(df1, df2, by = "Fruit")
#> # A tibble: 4 x 3
#>   Fruit   Qty1  Qty2
#>   <chr>  <dbl> <dbl>
#> 1 Apple     35    15
#> 2 Banana    10    20
#> 3 Orange    20    NA
#> 4 Pear      NA    40

If this isn't what you mean, please provide more details, preferably with a reproducible example (reprex).

Created on 2020-03-19 by the reprex package (v0.3.0)

Given two data frames with no common column, preventing a join operation, it's possible to cbind and eliminate via dplyr::select duplicated columns. This will result in NAs, of course. I can't be more precise without a reprex with representative data. See the FAQ: What's a reproducible example (`reprex`) and how do I do one? Using a reprex, complete with representative data will attract quicker and more answers.

library(readr)

Dataset1 <- read_csv("C:/Users/juanp/Desktop/CORONA/EXP1.csv")
#> Parsed with column specification:
#> cols(
#>   Name = col_character(),
#>   Sex = col_character(),
#>   ALAT = col_double(),
#>   ASAT = col_double(),
#>   Bilirrubine = col_double(),
#>   Alpha = col_double()
#> )
str(Dataset1)
#> Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 3 obs. of  6 variables:
#>  $ Name       : chr  "JUAN" "JUAN" "JUAN"
#>  $ Sex        : chr  "M" "M" "M"
#>  $ ALAT       : num  15 36 243
#>  $ ASAT       : num  864 4 25
#>  $ Bilirrubine: num  35 35 62
#>  $ Alpha      : num  7 447 34
#>  - attr(*, "spec")=
#>   .. cols(
#>   ..   Name = col_character(),
#>   ..   Sex = col_character(),
#>   ..   ALAT = col_double(),
#>   ..   ASAT = col_double(),
#>   ..   Bilirrubine = col_double(),
#>   ..   Alpha = col_double()
#>   .. )
Dataset2 <- read_csv("C:/Users/juanp/Desktop/CORONA/EXP2.csv", 
                     col_types = cols(Sex = col_character()))
str(Dataset2)
#> Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 3 obs. of  4 variables:
#>  $ Name      : chr  "MARIA" "MARIA" "MARIA"
#>  $ Sex       : chr  "F" "F" "F"
#>  $ ALAT      : num  62 47 9654
#>  $ Creatinine: num  26 2 63
#>  - attr(*, "spec")=
#>   .. cols(
#>   ..   Name = col_character(),
#>   ..   Sex = col_character(),
#>   ..   ALAT = col_double(),
#>   ..   Creatinine = col_double()
#>   .. )

Created on 2020-03-19 by the reprex package (v0.3.0)

Is it better like this?

No, that doesn't help since we don't have the CSVs you are reading in.

If you don't mind sharing a few samples of your data, please post the output of the two commands below. That will allow us to recreate a subset of your data frames.

dput(head(Dataset1, n = 20L))
dput(head(Dataset2, n = 20L))

I have written the code again with the commands you have given me. Is it ok right now? Excuse me is I am making mistakes, it is my first time using REPLEX.


library(readr)
Dataset1 <- read_csv("C:/Users/juanp/Desktop/CORONA/EXP1.csv")
#> Parsed with column specification:
#> cols(
#>   Name = col_character(),
#>   Sex = col_character(),
#>   ALAT = col_double(),
#>   ASAT = col_double(),
#>   Bilirrubine = col_double(),
#>   Alpha = col_double()
#> )


Dataset2 <- read_csv("C:/Users/juanp/Desktop/CORONA/EXP2.csv", 
                     col_types = cols(Sex = col_character()))

# These are the tables with the commands you have given me
dput(head(Dataset1, n = 20L))
#> structure(list(Name = c("JUAN", "JUAN", "JUAN"), Sex = c("M", 
#> "M", "M"), ALAT = c(15, 36, 243), ASAT = c(864, 4, 25), Bilirrubine = c(35, 
#> 35, 62), Alpha = c(7, 447, 34)), class = c("tbl_df", "tbl", "data.frame"
#> ), row.names = c(NA, -3L))
dput(head(Dataset2, n = 20L))
#> structure(list(Name = c("MARIA", "MARIA", "MARIA"), Sex = c("F", 
#> "F", "F"), ALAT = c(62, 47, 9654), Creatinine = c(26, 2, 63)), class = c("tbl_df", 
#> "tbl", "data.frame"), row.names = c(NA, -3L))

#These are the same tables in a more visual way
Dataset1
#> # A tibble: 3 x 6
#>   Name  Sex    ALAT  ASAT Bilirrubine Alpha
#>   <chr> <chr> <dbl> <dbl>       <dbl> <dbl>
#> 1 JUAN  M        15   864          35     7
#> 2 JUAN  M        36     4          35   447
#> 3 JUAN  M       243    25          62    34

Dataset2
#> # A tibble: 3 x 4
#>   Name  Sex    ALAT Creatinine
#>   <chr> <chr> <dbl>      <dbl>
#> 1 MARIA F        62         26
#> 2 MARIA F        47          2
#> 3 MARIA F      9654         63

Created on 2020-03-19 by the reprex package (v0.3.0)

That helps; thank you.

I think I understand now what you mean about putting NA in the fields that are missing. Dataset1 contains two variables ASAT and Billirrubine which are not present in Dataset2. Likewise, Dataset2 contains an extra variable Creatinine.

In this case, you could simply append the two data frames with bind_rows(). That will fill NA values for the missing fields.

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 3.6.3

Dataset1 <- structure(
  list(
    Name = c("JUAN", "JUAN", "JUAN"),
    Sex = c("M", "M", "M"),
    ALAT = c(15, 36, 243),
    ASAT = c(864, 4, 25),
    Bilirrubine = c(35, 35, 62),
    Alpha = c(7, 447, 34)),
  class = c("tbl_df", "tbl", "data.frame"),
  row.names = c(NA, -3L)
)

Dataset2 <- structure(
  list(
    Name = c("MARIA", "MARIA", "MARIA"),
    Sex = c("F", "F", "F"),
    ALAT = c(62, 47, 9654),
    Creatinine = c(26, 2, 63)),
  class = c("tbl_df", "tbl", "data.frame"),
  row.names = c(NA, -3L))

bind_rows(Dataset1, Dataset2)
#> # A tibble: 6 x 7
#>   Name  Sex    ALAT  ASAT Bilirrubine Alpha Creatinine
#>   <chr> <chr> <dbl> <dbl>       <dbl> <dbl>      <dbl>
#> 1 JUAN  M        15   864          35     7         NA
#> 2 JUAN  M        36     4          35   447         NA
#> 3 JUAN  M       243    25          62    34         NA
#> 4 MARIA F        62    NA          NA    NA         26
#> 5 MARIA F        47    NA          NA    NA          2
#> 6 MARIA F      9654    NA          NA    NA         63

Created on 2020-03-19 by the reprex package (v0.3.0)

Is this what you are trying to achieve?

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.