How to convert a heterogeneous list of vectors to a 2 column data frame?

jdblischak · September 12, 2019, 1:30pm

I have a heterogeneous list of vectors that I want to convert to a two column data frame. I have found a few solutions, but they are all quite complex. And I’m having trouble searching for ideas because I only find
solutions for lists of vectors of the same length. Is there a simpler way to accomplish this task?

Here is a minimal example of the starting list. The vectors vary in length and can also empty vectors or even NULL.

input <- list(A = letters[1:3], B = letters[3:4], C = NULL, D = character(0))
input

## $A
## [1] "a" "b" "c"
## 
## $B
## [1] "c" "d"
## 
## $C
## NULL
## 
## $D
## character(0)

And here is my desired output data frame. Each row corresponds to one of the elements of the vectors in the list of vectors, i.e. the first column is the name of the list element and the second column is the element of the vector. List elements with no data (e.g. NULL or character(0)) are omitted:

output <- data.frame(name = c(rep("A", length(input$A)), rep("B", length(input$B))),
                     item = c(input$A, input$B), stringsAsFactors = FALSE)
output

##   name item
## 1    A    a
## 2    A    b
## 3    A    c
## 4    B    c
## 5    B    d

I tried unlist(), which properly omits the empty list elements. But unfortunately it appends numbers to the names, which would require writing a fragile regex to remove them (e.g. what if the names of the list elements ended in numbers?).

list2df_unlist <- function(x) {
  tmp <- unlist(x)
  data.frame(name = names(tmp), item = tmp, stringsAsFactors = FALSE)
}
list2df_unlist(input)

##    name item
## A1   A1    a
## A2   A2    b
## A3   A3    c
## B1   B1    c
## B2   B2    d

My solution using base R used mapply() + do.call() and also required a separate helper function to properly filter the empty list elements.

list2df_mapply <- function(x) {
  list_to_df <- function(name, vec) {
    if (is.null(vec) || length(vec) == 0) return(NULL)
    
    data.frame(name = name, item = vec, stringsAsFactors = FALSE)
  }
  
  tmp <- mapply(list_to_df, as.list(names(x)), x)
  do.call(rbind, tmp)
}
list2df_mapply(input)

##   name item
## 1    A    a
## 2    A    b
## 3    A    c
## 4    B    c
## 5    B    d

My solution with purrr is simpler by replacing mapply() + do.call() with a single call to map2_dfr(), but it still required the helper function.

list2df_purrr <- function(x) {
  list_to_df <- function(name, vec) {
    if (is.null(vec) || length(vec) == 0) return(NULL)
    
    data.frame(name = name, item = vec, stringsAsFactors = FALSE)
  }
  purrr::map2_dfr(names(input), input, list_to_df)
}
list2df_purrr(input)

##   name item
## 1    A    a
## 2    A    b
## 3    A    c
## 4    B    c
## 5    B    d

I also explored purrr::imap_dfr(), but couldn’t get it to work. Any ideas on how to make this transformation code more readable? Thanks!

FJCC · September 12, 2019, 2:00pm

Almost there. Just the name of column 2 is ugly.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
input <- list(A = letters[1:3], B = letters[3:4], C = NULL, D = character(0))

input2 <- lapply(input, as.data.frame, stringsAsFactors = FALSE)
DF <- bind_rows(input2, .id = "Name")
DF
#>   Name X[[i]]
#> 1    A      a
#> 2    A      b
#> 3    A      c
#> 4    B      c
#> 5    B      d

^{Created on 2019-09-12 by the reprex package (v0.2.1)}

Yarnabrina · September 12, 2019, 2:43pm

Using as_tibble avoid this problem, as can be seen here. Just changing the definition of input2 should be enough.

Modification of code by @FJCC

library(magrittr)

input <- list(A = letters[1:3],
              B = letters[3:4],
              C = NULL,
              D = character(0))

input %>%
  purrr::map(.f = tibble::as_tibble) %>%
  dplyr::bind_rows(.id = "name")
#> # A tibble: 5 x 2
#>   name  value
#>   <chr> <chr>
#> 1 A     a    
#> 2 A     b    
#> 3 A     c    
#> 4 B     c    
#> 5 B     d

Alternative solution:

library(purrr)
library(tibble)

input <- list(A = letters[1:3],
              B = letters[3:4],
              C = NULL,
              D = character(0))

map_dfr(.x = input,
        .f = ~ enframe(x = .x,
                       name = NULL,
                       value = "Value does matter"),
        .id = "What's in a name")
#> # A tibble: 5 x 2
#>   `What's in a name` `Value does matter`
#>   <chr>              <chr>              
#> 1 A                  a                  
#> 2 A                  b                  
#> 3 A                  c                  
#> 4 B                  c                  
#> 5 B                  d

jdblischak · September 12, 2019, 5:52pm

@FJCC @Yarnabrina Thanks to both of you for your help! Below I've converted your suggestions into the function format I was using:

Solution from @FJCC:

list2df_dplyr <- function(x) {
  tmp <- lapply(x, as.data.frame, stringsAsFactors = FALSE)
  tmp <- dplyr::bind_rows(tmp, .id = "name")
  colnames(tmp)[2] <-  "item"
  tmp
}
list2df_dplyr(input)

##   name item
## 1    A    a
## 2    A    b
## 3    A    c
## 4    B    c
## 5    B    d

Solutions from @Yarnabrina:

list2df_tibble <- function(x) {
  tmp <- purrr::map(x, tibble::as_tibble)
  dplyr::bind_rows(tmp, .id = "name")
}
list2df_tibble(input)

## # A tibble: 5 x 2
##   name  value
##   <chr> <chr>
## 1 A     a    
## 2 A     b    
## 3 A     c    
## 4 B     c    
## 5 B     d

list2df_enframe <- function(x) {
  purrr::map_dfr(x, ~ tibble::enframe(x = .x, name = NULL, value = "item"),
                 .id = "name")
}
list2df_enframe(input)

## # A tibble: 5 x 2
##   name  item 
##   <chr> <chr>
## 1 A     a    
## 2 A     b    
## 3 A     c    
## 4 B     c    
## 5 B     d

I like the succinctness of this final approach. The main confusion I see with it (e.g. when returning to the code months later) is that you have to set name = NULL in the call to enframe() because the name column is instead added by map_dfr().

jdblischak · September 12, 2019, 6:00pm

And here is a solution using data.table. It is analogous to dplyr solution, replacing bind_rows() with rbindlist().

list2df_dt <- function(x) {
  tmp <- lapply(x, as.data.frame, stringsAsFactors = FALSE)
  tmp <- data.table::rbindlist(tmp, idcol = "name")
  colnames(tmp)[2] <-  "item"
  tmp
}
list2df_dt(input)

##    name item
## 1:    A    a
## 2:    A    b
## 3:    A    c
## 4:    B    c
## 5:    B    d

It seems that the reason that this is so much more cumbersome using only base R is that the do.call(rbind, list) paradigm doesn't provide a mechanism for adding an ID column.

jdblischak · September 19, 2019, 6:00pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.