The name you are searching for is df-column. This is a column of a data frame that is itself another data frame. This is different from a list-column, but we have named it with the same conventions.
They have limited uses, and support for them has only been added relatively recently to the tidyverse, but there are places where they come up, and there are a few tools to handle them.
In tidyr, unpack()
takes a df-col and effectively flattens the nesting structure. pack()
goes the other way, and is another way to construct a df-col.
library(tibble)
library(tidyr)
tibble1 <- tibble(a = 1:5, b = 2:6, c = 3:7)
tibble2 <- tibble(d = 4:8, col = tibble1)
# `col` is a df-column, it is a column that is itself a data frame
tibble2
#> # A tibble: 5 x 2
#> d col$a $b $c
#> <int> <int> <int> <int>
#> 1 4 1 2 3
#> 2 5 2 3 4
#> 3 6 3 4 5
#> 4 7 4 5 6
#> 5 8 5 6 7
# Extract it by name to get access to the "column", but the column itself is a data frame!
tibble2$col
#> # A tibble: 5 x 3
#> a b c
#> <int> <int> <int>
#> 1 1 2 3
#> 2 2 3 4
#> 3 3 4 5
#> 4 4 5 6
#> 5 5 6 7
# "unpack" that df-column into its individual columns
unpacked <- unpack(tibble2, col)
unpacked
#> # A tibble: 5 x 4
#> d a b c
#> <int> <int> <int> <int>
#> 1 4 1 2 3
#> 2 5 2 3 4
#> 3 6 3 4 5
#> 4 7 4 5 6
#> 5 8 5 6 7
# Pack 3 columns into a new df-column
pack(unpacked, col = a:c)
#> # A tibble: 5 x 2
#> d col$a $b $c
#> <int> <int> <int> <int>
#> 1 4 1 2 3
#> 2 5 2 3 4
#> 3 6 3 4 5
#> 4 7 4 5 6
#> 5 8 5 6 7
You might be surprised to learn that the more popular unnest()
function is actually implemented using unpack()
.
tidyr:::unnest.data.frame
#> function (data, cols, ..., keep_empty = FALSE, ptype = NULL,
#> names_sep = NULL, names_repair = "check_unique", .drop = "DEPRECATED",
#> .id = "DEPRECATED", .sep = "DEPRECATED", .preserve = "DEPRECATED")
#> {
#> cols <- tidyselect::eval_select(enquo(cols), data)
#> if (nrow(data) == 0) {
#> for (col in names(cols)) {
#> data[[col]] <- as_empty_df(data[[col]], col = col)
#> }
#> }
#> else {
#> for (col in names(cols)) {
#> data[[col]] <- map(data[[col]], as_df, col = col)
#> }
#> }
#> data <- unchop(data, any_of(cols), keep_empty = keep_empty,
#> ptype = ptype)
#> unpack(data, any_of(cols), names_sep = names_sep, names_repair = names_repair)
#> }
#> <bytecode: 0x7fca9edb3c80>
#> <environment: namespace:tidyr>
In dplyr, the new across()
function actually returns a data frame, and if you name the result in mutate()
then you can create a df-col, even though it is normally used without naming the result.
library(dplyr)
tbl <- tibble(x = 1:5, y = 6:10)
# `across()` returns a data frame, which is "packed" into a df-col called `col`
tbl_packed <- mutate(tbl, col = across(x:y, ~.x - 1L, .names = "{.col}_minus_one"))
tbl_packed
#> # A tibble: 5 x 3
#> x y col$x_minus_one $y_minus_one
#> <int> <int> <int> <int>
#> 1 1 6 0 5
#> 2 2 7 1 6
#> 3 3 8 2 7
#> 4 4 9 3 8
#> 5 5 10 4 9
# Effectively the same as:
mutate(tbl, col = tibble(x_minus_one = tbl$x - 1L, y_minus_one = tbl$y - 1L))
#> # A tibble: 5 x 3
#> x y col$x_minus_one $y_minus_one
#> <int> <int> <int> <int>
#> 1 1 6 0 5
#> 2 2 7 1 6
#> 3 3 8 2 7
#> 4 4 9 3 8
#> 5 5 10 4 9
# The more popular way to use `across()` is to not name the result.
# This causes the data frame that `across()` returns to be "auto-unpacked"
# (i.e. it automatically does what `tidyr::unpack()` would do)
mutate(tbl, across(x:y, ~.x - 1L, .names = "{.col}_minus_one"))
#> # A tibble: 5 x 4
#> x y x_minus_one y_minus_one
#> <int> <int> <int> <int>
#> 1 1 6 0 5
#> 2 2 7 1 6
#> 3 3 8 2 7
#> 4 4 9 3 8
#> 5 5 10 4 9
# Compare against:
tidyr::unpack(tbl_packed, col)
#> # A tibble: 5 x 4
#> x y x_minus_one y_minus_one
#> <int> <int> <int> <int>
#> 1 1 6 0 5
#> 2 2 7 1 6
#> 3 3 8 2 7
#> 4 4 9 3 8
#> 5 5 10 4 9
So we've been finding some interesting places to use these ideas, and they are definitely not considered off-label usage!