How to combine two separate dataframes of different lengths

I have three data-frames, each of a single column, but of different lengths, each with a distinct header. How do I join them into a single data-frame? Example of input.

df1> Case 1 df2> Case 2 df3> Case 3
ABCCD TCGGS ZCRTTS
PCCGT GCQZZ
XGGTS

I'd like to combine them into a single table or data-frame, like:

Case 1 Case 2 Case 3
ABCCD TCGGS ZCRTTS
PCCGT GCQZZ
XGGTS

I've tried using merge and join, but I don't get the result above.

I think the path of lease resistance is to

  1. Remove the headers with
colnames(df.x) <- NULL
  1. Row bind
new_df <- rbind(df1,df2,df3)
  1. Assign colnames
colnames(new_df) <- "Case 1 Case 2 Case 3"

BTW: length() of data frames refers to number of columns; to get the number of rows use nrow(). Tripped me up more than once.

1 Like

Interesting. So the first step is to remove the headers, then combine the df and finally add the column names. I'll give it a shot and let you know. Thanks.

1 Like

Hmm, I'm getting the following error:
Error in if (facCol[jj]) { : missing value where TRUE/FALSE needed
It seems I'm missing a parameter in rbind?

1 Like

That is hard to say without seeing your code.

Anyhow, have you considered merge's all.x/all.y arguments? dplyr's left_join/right_join offer the same functionality and are explained in detail here: https://r4ds.had.co.nz/relational-data.html.

HTH

Dag

Serves me right for not doing a reprex; I'll come back from errands and take another whack at the piñata.

UPDATE: this is for a single column data frame--not responsive, ignore it.

Ok, I've got my head screwed on right; all df must have same non-null column names

df1  <- as.data.frame(seq(1:10))
df2  <- as.data.frame(seq(1:20))
df3  <- as.data.frame(seq(1:30))
colnames(df1) <- "X"
colnames(df2) <- "X"
colnames(df3) <- "X"
rbind(df1,df2,df3)
#>     X
#> 1   1
#> 2   2
#> 3   3
#> 4   4
#> 5   5
#> 6   6
#> 7   7
#> 8   8
#> 9   9
#> 10 10
#> 11  1
#> 12  2
#> 13  3
#> 14  4
#> 15  5
#> 16  6
#> 17  7
#> 18  8
#> 19  9
#> 20 10
#> 21 11
#> 22 12
#> 23 13
#> 24 14
#> 25 15
#> 26 16
#> 27 17
#> 28 18
#> 29 19
#> 30 20
#> 31  1
#> 32  2
#> 33  3
#> 34  4
#> 35  5
#> 36  6
#> 37  7
#> 38  8
#> 39  9
#> 40 10
#> 41 11
#> 42 12
#> 43 13
#> 44 14
#> 45 15
#> 46 16
#> 47 17
#> 48 18
#> 49 19
#> 50 20
#> 51 21
#> 52 22
#> 53 23
#> 54 24
#> 55 25
#> 56 26
#> 57 27
#> 58 28
#> 59 29
#> 60 30
rbind(df1,df2,df3)
#>     X
#> 1   1
#> 2   2
#> 3   3
#> 4   4
#> 5   5
#> 6   6
#> 7   7
#> 8   8
#> 9   9
#> 10 10
#> 11  1
#> 12  2
#> 13  3
#> 14  4
#> 15  5
#> 16  6
#> 17  7
#> 18  8
#> 19  9
#> 20 10
#> 21 11
#> 22 12
#> 23 13
#> 24 14
#> 25 15
#> 26 16
#> 27 17
#> 28 18
#> 29 19
#> 30 20
#> 31  1
#> 32  2
#> 33  3
#> 34  4
#> 35  5
#> 36  6
#> 37  7
#> 38  8
#> 39  9
#> 40 10
#> 41 11
#> 42 12
#> 43 13
#> 44 14
#> 45 15
#> 46 16
#> 47 17
#> 48 18
#> 49 19
#> 50 20
#> 51 21
#> 52 22
#> 53 23
#> 54 24
#> 55 25
#> 56 26
#> 57 27
#> 58 28
#> 59 29
#> 60 30

Created on 2020-02-21 by the reprex package (v0.3.0)

This is one option

library(tidyverse)

df1 <- data.frame(stringsAsFactors = FALSE,
                  case1 = c("ABCCD", "PCCGT", "XGGTS"))
df2 <- data.frame(stringsAsFactors = FALSE,
                  case2 = c("TCGGS", "GCQZZ"))
df3 <- data.frame(stringsAsFactors = FALSE,
                  case3 = c("ZCRTTS"))

df1 %>%
    rownames_to_column() %>% 
    left_join(df2 %>% rownames_to_column()) %>% 
    left_join(df3 %>% rownames_to_column()) %>% 
    select(-rowname)
#> Joining, by = "rowname"
#> Joining, by = "rowname"
#>   case1 case2  case3
#> 1 ABCCD TCGGS ZCRTTS
#> 2 PCCGT GCQZZ   <NA>
#> 3 XGGTS  <NA>   <NA>
5 Likes

Disregard the man behind the wall. I screwed up reading the question.

Interesting, but I needed the columns side by side, not combined into a single column.

This seems to work well. Is there a way to have the code ignore inserting and just leave the cell blank?

The rownames trick is basically a workaround for the fact that a data frame is fundamentally meant to have the same number of rows for each column.

From the data frame docs:

A data frame is a list of variables of the same number of rows with unique row names, given class "data.frame" . If no variables are included, the row names determine the number of rows.

Further down

Objects passed to data.frame should have the same number of rows, but atomic vectors (see is.vector ), factors and character vectors protected by I will be recycled a whole number of times if necessary (including as elements of list arguments).

You could replace NA with some other desired value (e.g. an empty string), but something has to be there to maintain the structure of the object.

1 Like

Thanks for the explanation, Mara. How do I pass an empty string instead of the NA?

the package sjmisc, has a convenient helper for this. sjmisc::replace_na()

 testframe <- tribble(~a, ~b,
                       "X",NA,
                        NA,NA)
> testframe
# A tibble: 2 x 2
  a     b    
  <chr> <lgl>
1 X     NA   
2 NA    NA   

replace_na(testframe,value="")
# A tibble: 2 x 2
  a     b    
  <chr> <chr>
1 "X"   ""   
2 ""    ""
2 Likes

tidyr has its own replace_na() but it is not as straight forward.

library(tidyverse)

df1 <- data.frame(stringsAsFactors = FALSE,
                  case1 = c("ABCCD", "PCCGT", "XGGTS"))
df2 <- data.frame(stringsAsFactors = FALSE,
                  case2 = c("TCGGS", "GCQZZ"))
df3 <- data.frame(stringsAsFactors = FALSE,
                  case3 = c("ZCRTTS"))

df1 %>%
    rownames_to_column() %>% 
    left_join(df2 %>% rownames_to_column()) %>% 
    left_join(df3 %>% rownames_to_column()) %>% 
    select(-rowname) %>% 
    replace_na(list(case1 = "", case2 = "", case3 = ""))
#> Joining, by = "rowname"
#> Joining, by = "rowname"
#>   case1 case2  case3
#> 1 ABCCD TCGGS ZCRTTS
#> 2 PCCGT GCQZZ       
#> 3 XGGTS
1 Like

Base alternative here, too. Though I think sjmisc is the easiest way to go!

Another possibility is to use mutate_all() instead of replace_na():

df1 %>%
  rownames_to_column() %>% 
  left_join(df2 %>% rownames_to_column()) %>% 
  left_join(df3 %>% rownames_to_column()) %>% 
  select(-rowname) %>% 
  mutate_all(~ if_else(is.na(.), "",  .))
  # uses anonymous function syntax with '.' as variable: ~ f(.)

and if df1 and df2 are already tibbles, the analogue would be:

df1 %>% mutate(rowname =  row_number()) %>%
  # since tibbles don't allow row names
  left_join(df2 %>% mutate(rowname =  row_number())) %>% 
  left_join(df3 %>% mutate(rowname =  row_number())) %>% 
  select(-rowname) %>% 
  mutate_if(is.character, ~ if_else(is.na(.), "",  .))
  # since tibbles require uniform column data types, "" can only appear 
  # in character columns

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.