Merge Not Converging on Key

Hello,

Long story short I'm trying to merge two databases by using "IDs" as the key. Although they are refusing to converge correctly. In other words they just end up stacking (screenshot) when I use full_join or merge commands. Similarly, I tried to run it another way using the cbind command and that throws everything at the end of the database but it does not collapse them based on the key.

Thanks

dataa<- tibble::tribble(
            ~IDs, ~Age,  ~Gender,     ~race_eth,                     ~relationship_status,
          "SS35",  22L, "Female",       "White", "Single, not in a romantic relationship",
          "SS34",  18L, "Female",       "White", "Single, not in a romantic relationship",
          "SS33",  20L, "Female",       "White",                                "Widowed",
          "SS02",  21L, "Female", "Multiracial",      "Not married but in a relationship",
          "SS08",  22L, "Female",    "Hispanic",      "Not married but in a relationship",
          "SS10",  21L, "Female",       "White", "Single, not in a romantic relationship",
          "SS19",  19L, "Female",       "White",      "Not married but in a relationship",
          "SS22",  20L, "Female",       "Other", "Single, not in a romantic relationship",
          "SS26",  21L, "Female",    "Hispanic", "Single, not in a romantic relationship",
          "SS27",  21L, "Female",       "White",      "Not married but in a relationship"
          )
head(dataa)
#> # A tibble: 6 × 5
#>   IDs     Age Gender race_eth    relationship_status                   
#>   <chr> <int> <chr>  <chr>       <chr>                                 
#> 1 SS35     22 Female White       Single, not in a romantic relationship
#> 2 SS34     18 Female White       Single, not in a romantic relationship
#> 3 SS33     20 Female White       Widowed                               
#> 4 SS02     21 Female Multiracial Not married but in a relationship     
#> 5 SS08     22 Female Hispanic    Not married but in a relationship     
#> 6 SS10     21 Female White       Single, not in a romantic relationship
dataab<- tibble::tribble(
              ~IDs, ~StartStress, ~RetBase, ~HRT_00_00, ~HRT_00_01, ~HRT_00_02, ~HRT_00_03, ~HRT_00_04, ~HRT_00_05,
            "SS34",      "25_14",  "21_20",        56L,        55L,        55L,        54L,        52L,        51L,
            "SS35",      "26_55",  "22_15",        76L,        76L,        76L,        77L,        78L,        78L,
            "SS33",      "25_34",  "21_25",       103L,       103L,       103L,       103L,       103L,       104L,
            "SS02",      "25_50",  "22_20",       105L,       104L,       112L,       112L,       113L,       100L,
           "SS26",      "26_23",  "21_45",        47L,        47L,        47L,        47L,        45L,        45L,
           "SS27",      "26_00",  "22_35",        53L,        53L,        53L,        53L,        53L,        52L,
            "SS08",      "25_00",  "19_40",        80L,        80L,        80L,        80L,        81L,        82L,
            "SS19",      "26_15",  "22_50",        94L,        81L,        80L,        74L,        74L,        74L,
            "SS10",      "26_45",  "22_05",        96L,        96L,        96L,        96L,        93L,        90L,
           "SS22",      "25_20",  "21_15",        61L,        66L,        66L,        66L,        66L,        71L,
            "CD11",      "25_40",  "22_00",        48L,        49L,        49L,        49L,        49L,        49L
           )
head(dataab)
#> # A tibble: 6 × 9
#>   IDs   StartStress RetBase HRT_00_00 HRT_00_01 HRT_00_02 HRT_00_03 HRT_00_04
#>   <chr> <chr>       <chr>       <int>     <int>     <int>     <int>     <int>
#> 1 SS34  25_14       21_20          56        55        55        54        52
#> 2 SS35  26_55       22_15          76        76        76        77        78
#> 3 SS33  25_34       21_25         103       103       103       103       103
#> 4 SS02  25_50       22_20         105       104       112       112       113
#> 5 SS26  26_23       21_45          47        47        47        47        45
#> 6 SS27  26_00       22_35          53        53        53        53        53
#> # … with 1 more variable: HRT_00_05 <int>

Try this:

data <- full_join(dataaa,dataab,by="IDs")

Specifying the ID column should get you the result you're after.

I should have posted what I have tried already (see screenshot). I've tried that and the merge function as well with specifying the Key (i.e., using by=)

Can you share a screenshot of what mergedd looks like?

This produces the expected result for a full join with the sample data you have provided

library(dplyr)

dataa <- tibble::tribble(
    ~IDs, ~Age,  ~Gender,     ~race_eth,                     ~relationship_status,
    "SS35",  22L, "Female",       "White", "Single, not in a romantic relationship",
    "SS34",  18L, "Female",       "White", "Single, not in a romantic relationship",
    "SS33",  20L, "Female",       "White",                                "Widowed",
    "SS02",  21L, "Female", "Multiracial",      "Not married but in a relationship",
    "SS08",  22L, "Female",    "Hispanic",      "Not married but in a relationship",
    "SS10",  21L, "Female",       "White", "Single, not in a romantic relationship",
    "SS19",  19L, "Female",       "White",      "Not married but in a relationship",
    "SS22",  20L, "Female",       "Other", "Single, not in a romantic relationship",
    "SS26",  21L, "Female",    "Hispanic", "Single, not in a romantic relationship",
    "SS27",  21L, "Female",       "White",      "Not married but in a relationship"
)

dataab <- tibble::tribble(
    ~IDs, ~StartStress, ~RetBase, ~HRT_00_00, ~HRT_00_01, ~HRT_00_02, ~HRT_00_03, ~HRT_00_04, ~HRT_00_05,
    "SS34",      "25_14",  "21_20",        56L,        55L,        55L,        54L,        52L,        51L,
    "SS35",      "26_55",  "22_15",        76L,        76L,        76L,        77L,        78L,        78L,
    "SS33",      "25_34",  "21_25",       103L,       103L,       103L,       103L,       103L,       104L,
    "SS02",      "25_50",  "22_20",       105L,       104L,       112L,       112L,       113L,       100L,
    "SS26",      "26_23",  "21_45",        47L,        47L,        47L,        47L,        45L,        45L,
    "SS27",      "26_00",  "22_35",        53L,        53L,        53L,        53L,        53L,        52L,
    "SS08",      "25_00",  "19_40",        80L,        80L,        80L,        80L,        81L,        82L,
    "SS19",      "26_15",  "22_50",        94L,        81L,        80L,        74L,        74L,        74L,
    "SS10",      "26_45",  "22_05",        96L,        96L,        96L,        96L,        93L,        90L,
    "SS22",      "25_20",  "21_15",        61L,        66L,        66L,        66L,        66L,        71L,
    "CD11",      "25_40",  "22_00",        48L,        49L,        49L,        49L,        49L,        49L
)

full_join(dataa,dataab,by="IDs")
#> # A tibble: 11 × 13
#>    IDs     Age Gender race_eth    relationship_st… StartStress RetBase HRT_00_00
#>    <chr> <int> <chr>  <chr>       <chr>            <chr>       <chr>       <int>
#>  1 SS35     22 Female White       Single, not in … 26_55       22_15          76
#>  2 SS34     18 Female White       Single, not in … 25_14       21_20          56
#>  3 SS33     20 Female White       Widowed          25_34       21_25         103
#>  4 SS02     21 Female Multiracial Not married but… 25_50       22_20         105
#>  5 SS08     22 Female Hispanic    Not married but… 25_00       19_40          80
#>  6 SS10     21 Female White       Single, not in … 26_45       22_05          96
#>  7 SS19     19 Female White       Not married but… 26_15       22_50          94
#>  8 SS22     20 Female Other       Single, not in … 25_20       21_15          61
#>  9 SS26     21 Female Hispanic    Single, not in … 26_23       21_45          47
#> 10 SS27     21 Female White       Not married but… 26_00       22_35          53
#> 11 CD11     NA <NA>   <NA>        <NA>             25_40       22_00          48
#> # … with 5 more variables: HRT_00_01 <int>, HRT_00_02 <int>, HRT_00_03 <int>,
#> #   HRT_00_04 <int>, HRT_00_05 <int>

Created on 2022-07-14 by the reprex package (v2.0.1)

Can you explain in which way this differs from the result you are expecting? ideally, Can you provide a proper REPRoducible EXample (reprex) illustrating your issue.

1 Like

That so strange, I feel like I tried that 20 times. Anyways, that worked. Thanks.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.