unnest_longer() behaves unexpectedly with list column of tibbles

When using unnest_longer() on a list column that contains tibbles the output is different than what is anticipated. I am unsure how to describe the problem so please refer to the reprex below as a supplement!

When using the legacy unnest(), a new column is made for each column in the tibbles in the list columns. When using unnest_longer() rather than creating new columns as described above, the unnested column becomes a tibble column. While this looks fine when printed, it cannot be interacted with with standard dplyr functions—or if it can, it is unclear how to do so.

Is there a way to unnest the tibble as the legacy unnest() does with unnest_longer()? Or is there a new approach to do this within the tidyverse?

  library(tidyr)
  library(dplyr)

  tbl_list <- list(
    tibble(y = letters[1:3]),
    tibble(y = letters[4:6]),
    tibble(y = letters[7:9])
  )
  
  my_tbl <- tibble(id = 1:3, x = tbl_list)
  
(tbl_legacy_unnest <- my_tbl %>% 
    unnest(x))
#> # A tibble: 9 x 2
#>      id y    
#>   <int> <chr>
#> 1     1 a    
#> 2     1 b    
#> 3     1 c    
#> 4     2 d    
#> 5     2 e    
#> 6     2 f    
#> 7     3 g    
#> 8     3 h    
#> 9     3 i
  
(tbl_unnest_longer <- my_tbl %>% 
    unnest_longer(x))
#> # A tibble: 9 x 2
#>      id x$y  
#>   <int> <chr>
#> 1     1 a    
#> 2     1 b    
#> 3     1 c    
#> 4     2 d    
#> 5     2 e    
#> 6     2 f    
#> 7     3 g    
#> 8     3 h    
#> 9     3 i
  
  select(tbl_legacy_unnest, y)
#> # A tibble: 9 x 1
#>   y    
#>   <chr>
#> 1 a    
#> 2 b    
#> 3 c    
#> 4 d    
#> 5 e    
#> 6 f    
#> 7 g    
#> 8 h    
#> 9 i
  select(tbl_unnest_longer, y)
#> Error: Can't subset columns that don't exist.
#> x Column `y` doesn't exist.

Created on 2020-10-06 by the reprex package (v0.3.0)

suppressPackageStartupMessages({
  library(tidyr)})

tbl_list <- list(
  tibble(y = letters[1:3]),
  tibble(y = letters[4:6]),
  tibble(y = letters[7:9])
)

my_tbl <- tibble(id = 1:3, x = tbl_list)
unnest_legacy(my_tbl)
#> # A tibble: 9 x 2
#>      id y    
#>   <int> <chr>
#> 1     1 a    
#> 2     1 b    
#> 3     1 c    
#> 4     2 d    
#> 5     2 e    
#> 6     2 f    
#> 7     3 g    
#> 8     3 h    
#> 9     3 i

Created on 2020-10-06 by the reprex package (v0.3.0.9001)

Are you sure you want to use unnest_longer() in this case? From the docs here:

These principles guide their behaviour when they are called with a non-primary data type. For example, if you unnest_wider() a list of data frames, the number of rows must be preserved, so each column is turned into a list column of length one. Or if you unnest_longer() a list of data frame, the number of columns must be preserved so it creates a packed column. I'm not sure how if these behaviours are useful in practice, but they are theoretically pleasing.

The key phrase there about unnest_longer() is "the number of columns must be preserved so it creates a packed column" (my emphasis). You probably want to use unnest() (the newer variant). But in case you don't, you can still interact with y (which I assume is your end goal here), like so:

mutate(tbl_unnest_longer, rev_y = rev(x$y))
#> # A tibble: 9 x 3
#>      id x$y   rev_y
#>   <int> <chr> <chr>
#> 1     1 a     i    
#> 2     1 b     h    
#> 3     1 c     g    
#> 4     2 d     f    
#> 5     2 e     e    
#> 6     2 f     d    
#> 7     3 g     c    
#> 8     3 h     b    
#> 9     3 i     a    

And here's just a plain-ole unnest():

unnest(my_tbl, x)
#> # A tibble: 9 x 2
#>      id y    
#>   <int> <chr>
#> 1     1 a    
#> 2     1 b    
#> 3     1 c    
#> 4     2 d    
#> 5     2 e    
#> 6     2 f    
#> 7     3 g    
#> 8     3 h    
#> 9     3 i 

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

@mmuurr, that's super helpful. I didn't realize the unnest() function is a modern iteration of it.

It is unclear to me what a packed column is or "non-primary data". I guess that's the next topic for me to understand.

Very helpful!