How do I combine two dataframes with different number of rows and columns

Hi everyone,

I am a beginner in R and I have to do a project, please if someone knows about how to combine two data frames with different number of rows and columns.

Thank you, I will appreciate very much.

Regards,
Milagros

If they have a common "key" you can use a "join" function

https://dplyr.tidyverse.org/reference/join.html

If you need more specific help, please provide a proper REPRoducible EXample (reprex) illustrating your issue.

glimpse(df1)

glimpse(df1)
Observations: 10,170
Variables: 5
Patient <int> 1369, 1410, 1156, 663, 1198, 740, 574, 787, 623, 1116, 956, 568, 522, 14… Weight 94.89, 64.56, 119.04, 61.56, 50.42, 59.68, 51.01, 75.96, 101.26, 94.55, …
Height <dbl> 1.59, 1.67, 1.49, 1.86, 1.57, 1.86, 1.84, 1.62, 1.84, 1.46, 1.74, 1.55, … IMC 37.53, 23.15, 53.62, 17.79, 20.46, 17.25, 15.07, 28.94, 29.91, 44.36, 16…
$ Date 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01,…

glimpse(df2)
Observations: 10,167
Variables: 6
Patient <int> 1369, 1410, 1156, 663, 1198, 740, 574, 787, 623, 1116, 956, 568,… Systolic 113, 91, 91, 114, 100, 96, 111, 132, 141, 107, 84, 94, 116, 121,…
Diastolic <int> 93, 87, 58, 73, 60, 59, 87, 56, 82, 75, 55, 76, 63, 51, 63, 74, … AvBloodPressure 93, 110, 92, 121, 99, 92, 109, 133, 83, 91, 90, 121, 109, 131, 8…
HeartRate <int> 109, 99, 93, 62, 61, 106, 53, 101, 78, 93, 56, 110, 109, 100, 55… Date 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01, 2013-01-01, 201…

Hi,
Dataframe df1 has 10,170 and df2 has 10,167 with different variables. The only common variables are Patient and Date.
Please, how do I combine these two data frames?

Please read the link I gave you before and try to provide data on a copy/paste friendly format (also if this is real data, please make sure it is anonymized)

#Abrir archivos

BloodPressure <- read.csv(file.choose())
View(BloodPressure)
Glucose <- read.csv(file.choose())
View(Glucose)
Oximetry <- read.csv(file.choose())
View(Oximetry)
Weight_Height <- read.csv(file.choose())
View(Weight_Height)
unique(BloodPressure)
View(BloodPressure)
unique(Weight_Height)
View(Weight_Height)

#Join

dfnewq<-right_join(df1, df2, by = NULL, copy = FALSE, suffix = c("Patient","Date"))
View(dfnewq)

merge

dfnewq<-merge(df1, df2, by = c("Patient","Date"))
View(dfnewq)

na.omit

df11<-na.omit(df1)
df22<-na.omit(df2)

#Boxplot

boxplot(datosWeight$Weight,
main = "Weight",
boxwex = 0.5,col="blue")

#cbind
df<-cbind.data.frame(df1,df2)
View(df)

#Glucose con formato de fecha y ordenado

Glucose$Date<-as.Date(as.character(Glucose$Date), format="%m/%d/%Y")
View(Glucose)
df3<-arrange(Glucose,Patient,Date)
View(df3)

#Oximetry con formato de fecha y ordenado

Oximetry$Date<-as.Date(as.character(Oximetry$Date), format="%m/%d/%Y")
View(Oximetry)
df4<-arrange(Oximetry,Patient,Date)
View(df4)

#Esta integracion no esta bien
df5 = merge(df1, df2, by.x=c("Patient", "Date"))
View(df5)
df5<-unique(df5)
View(df5)

newdf <- df2 %>% right_join(df1, by=c("Patient","Date"))
View(newdf)

m3 <-merge(df1, df2, by.x = "Patient", by.y = "Patient", all.x = FALSE)
View(m3)

m3 <-full_join(df1, df2, by= "Patient")
View(m3)

total <- merge(df1,df2,by=c("Patient","Date"))
View(total)

total <- left_join(df2,df1,by=c("Patient","Date"),all.x=TRUE)
View(total)

df1 is attached below.
Please let me know how to join these both data frames.

All I can see is a screenshot of a dataframe, I can't copy data from that, please read the guide and use a copy/paste friendly format.

Here is the Spanish version in case you prefer

Thank you very much, I am sending data:
datapasta::df_paste(head(df1, 10)[, c("Patient","Date","Weight","Height")])data.frame(
Patient = c(1L,
1L,1L,1L,1L,2L,
2L,2L,2L,2L),
Date = c("2014-02-24",
"2014-11-13","2015-04-30",
"2014-08-29",
"2015-04-30","2015-01-09",
"2015-01-01",
"2015-01-13","2015-01-05",
"2013-03-13"),
Weight = c(47.6,
48.1,49.2,49.7,
76,46.06,46.07,
46.53,46.85,110.53),
Height = c(1.57,
1.57,1.52,1.55,
1.54,1.58,1.58,
1.58,1.58,1.82)
)

datapasta::df_paste(head(df2, 10)[, c("Patient","Date","Systolic","Diastolic","HeartRate")])data.frame(
Patient = c(1L,1L,
1L,1L,2L,2L,2L,
2L,2L,2L),
Date = c("2014-02-24","2014-11-13",
"2015-04-30",
"2014-08-29","2015-01-13",
"2013-03-05",
"2015-01-05",
"2015-01-01","2013-03-01",
"2015-01-09"),
Systolic = c(93L,96L,
98L,108L,80L,
86L,109L,112L,118L,
137L),
Diastolic = c(69L,73L,
74L,76L,50L,86L,
55L,50L,61L,91L),
HeartRate = c(95L,
100L,87L,81L,84L,
106L,80L,107L,59L,
67L)
)

Is this what you are trying to do?

library(dplyr)

# Sample data on a copy/paste friendly format
df1 <- data.frame(
     Patient = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L),
      Weight = c(47.6, 48.1, 49.2, 49.7, 76, 46.06, 46.07, 46.53, 46.85, 110.53),
      Height = c(1.57, 1.57, 1.52, 1.55, 1.54, 1.58, 1.58, 1.58, 1.58, 1.82),
        Date = as.factor(c("2014-02-24",
                           "2014-11-13","2015-04-30","2014-08-29","2015-04-30",
                           "2015-01-09","2015-01-01","2015-01-13","2015-01-05",
                           "2013-03-13"))
)

df2 <- data.frame(
     Patient = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L),
    Systolic = c(93L, 96L, 98L, 108L, 80L, 86L, 109L, 112L, 118L, 137L),
   Diastolic = c(69L, 73L, 74L, 76L, 50L, 86L, 55L, 50L, 61L, 91L),
   HeartRate = c(95L, 100L, 87L, 81L, 84L, 106L, 80L, 107L, 59L, 67L),
        Date = as.factor(c("2014-02-24",
                           "2014-11-13","2015-04-30","2014-08-29","2015-01-13",
                           "2013-03-05","2015-01-05","2015-01-01","2013-03-01",
                           "2015-01-09"))
)

df1 %>% 
    full_join(df2, by = c("Patient", "Date")) %>%
    select(Patient, Date, everything())
#>    Patient       Date Weight Height Systolic Diastolic HeartRate
#> 1        1 2014-02-24  47.60   1.57       93        69        95
#> 2        1 2014-11-13  48.10   1.57       96        73       100
#> 3        1 2015-04-30  49.20   1.52       98        74        87
#> 4        1 2014-08-29  49.70   1.55      108        76        81
#> 5        1 2015-04-30  76.00   1.54       98        74        87
#> 6        2 2015-01-09  46.06   1.58      137        91        67
#> 7        2 2015-01-01  46.07   1.58      112        50       107
#> 8        2 2015-01-13  46.53   1.58       80        50        84
#> 9        2 2015-01-05  46.85   1.58      109        55        80
#> 10       2 2013-03-13 110.53   1.82       NA        NA        NA
#> 11       2 2013-03-05     NA     NA       86        86       106
#> 12       2 2013-03-01     NA     NA      118        61        59

Created on 2020-02-01 by the reprex package (v0.3.0)

Hi andresrcs, thank you for your help.
Yes, I've tried with this function but If I apply to the entire data in df1 (10,170 obs.) and df2 (10,167 obs.), I will get in total 25,000 obs.
If I join df1 and df2 I should get in total something about 10,170 obs.

That would not be the case if you have different combinations of Patient and Date in both datasets (like in your sample data) because of the use of full_join(), this function keeps all combinations and fills missing data with NAs thus resulting in more rows. If you only want to keep combinations from one particular data frame then you should use left_join() or right_join() or if you only want to keep matching combinations among both data frames use inner_join() (this will result in fewer rows).

Thank you very much Andres

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.