unlist columns with lists of different length

Hello,

I have tibble with with i.e. two columns and only one row. Each cell contains a list. I would like to unnest them 'longer', so that each element of list has its own row. The problem seems to be that the lists have different lengths, what implies that one column (with the shorter list) should have NAs at the end.

I saw this related post, but was wondering whether there is a more elegant way.

Many thanks. r

library(tidyverse)

#what I have
my_df <- tibble(country1=list(c("country1_case1", "country2_case2")),
             country2=list(c("country2_case1", "country2_case2", "country2_cases3")))

#what I want:
goal_df <- tibble::tribble(
         ~country1,        ~country2,
  "country1_case1", "country2_case1",
  "country1_case2", "country2_case2",
  "NA", "country2_case3")

#something like?
df %>% 
  unnest_longer(col=c(country1, country2))
#> Error: object 'country1' not found

Created on 2020-09-10 by the reprex package (v0.3.0)

Inspired by this Stackoverflow post. The trick is to first make the list-cols the same length and then use plain old unnest().

library(tidyverse)

# what I have
my_df <- tibble(
  country1 = list(c("country1_case1", "country2_case2")),
  country2 = list(c("country2_case1", "country2_case2", "country2_cases3"))
)

# what I want:
goal_df <- tibble::tribble(
  ~country1, ~country2,
  "country1_case1", "country2_case1",
  "country1_case2", "country2_case2",
  "NA", "country2_case3"
)

my_df %>% 
  pivot_longer(everything()) %>% 
  mutate(value = map(value, `length<-`, max(lengths(value)))) %>% 
  pivot_wider(names_from = name, values_from = value) %>% 
  unnest(everything())
#> # A tibble: 3 x 2
#>   country1       country2       
#>   <chr>          <chr>          
#> 1 country1_case1 country2_case1 
#> 2 country2_case2 country2_case2 
#> 3 <NA>           country2_cases3

Created on 2020-09-10 by the reprex package (v0.3.0)

1 Like

Excellent. Many thanks!

Siddharth's solution is perfectly great, but just for fun I wanted to have a go with a slightly different approach. I must say I needed the length<-() function though, the first time I've used one of those <- functions, that I'm aware of. So that was a nice piece of learning for me.

Here's my alternative approach:

library(dplyr)
library(purrr)
library(tidyr)

rectangularize <- function(df) {
  max_len <- df %>%
    map(unlist) %>%
    lengths() %>%
    max()

  df %>%
    map(unlist) %>%
    map(~ `length<-`(., max_len)) %>%
    bind_cols()
}

tibble(
  country1 = list(c("country1_case1", "country2_case2")),
  country2 = list(c("country2_case1", "country2_case2", "country2_cases3"))
) %>%
  rectangularize()
#> # A tibble: 3 x 2
#>   country1       country2       
#>   <chr>          <chr>          
#> 1 country1_case1 country2_case1 
#> 2 country2_case2 country2_case2 
#> 3 <NA>           country2_cases3

Created on 2020-09-11 by the reprex package (v0.3.0)

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.