Split uneven length vectors to columns with tidyr

Without trying to step on @mara's toes, here's my understanding of how dealing with lists → vectors should typically fall out:

  • rlang::flatten and rlang::squash: Input is a list, flatten removes one "level" from the list, while squash removes all levels (roughly equivalent to running flatten repeatedly until the data has no lists left). unlist is the base version of these, but can be difficult to predict what the type of the output will be on unknown data. Ideally, you use flatten_*, like flatten_int, so that you can force the output into the form that you expect as part of the process, rather than potentially going through multiple conversions with the associated problems that can cause.
  • tidyr::unnest: Input is a data frame that includes at least one list column (containing vectors or data frames). It will duplicate all other columns so that each item of the vector (or each row of the data frame) gets its own row. It is inverted by tidyr::nest, which is one potential way that the list column could have been generated in the first place.

So, in general, if you are working in a data frame already, you use unnest (like in this example). If you are working on the list directly, you use flatten and friends. Where the Github issues that you pointed out run into problems is that unnest currently only handles list columns containing atomic vectors and data frames, not a list of lists. That's not super common (at least from my usage), but it should be a future feature of tidyr, given this Github issue:

That shows a good toy example of where the current tools fall down, but I don't think it really applies in this case.

Also, you are correct that the pluck function could be used instead of [[1]], though the later version of the code makes that unnecessary. You could also use map, since providing a number to it is similar to mapping pluck onto each element of the input. These four methods of using str_split give equivalent output:

suppressPackageStartupMessages(library(tidyverse))

fruits <- c(
  "apples and oranges and pears and bananas",
  "pineapples and mangos and guavas"
)

# Equivalent to original code
map(fruits, ~ str_split(.x, " and ")[[1]])

# With pluck
map(fruits,
    ~ str_split(.x, " and ") %>% 
      pluck(1))

# With a second map
map(fruits, ~ str_split(.x, " and ")) %>% 
  map(1)

# Using the standard str_split method
str_split(fruits, " and ")
3 Likes