What is the purpose for tibble$sub_tibble$columns?

Hey all,

Sorry for the poor title, I really don't know what to even call it because my Googling didn't help me further in terms of naming this feature. I noticed that if I have a tibble:

tibble_1 <- tibble(a = 1:10, b = 2:11, c = 3:12)

And I add this into another tibble using dplyr::mutate:

tibble_2 <- tibble(d = 4:13) %>%
    mutate(tibble_one = tibble_1)

The print output looks a bit funny:

# A tibble: 10 x 2
       d tibble_one$a    $b    $c
   <int>        <int> <int> <int>
 1     4            1     2     3
 2     5            2     3     4
 3     6            3     4     5
 4     7            4     5     6
 5     8            5     6     7
 6     9            6     7     8
 7    10            7     8     9
 8    11            8     9    10
 9    12            9    10    11
10    13           10    11    12

And, indeed, if I wanted to subset the colums I'd have to adress them as tibble_2$tibble_one$a, etc.
I accidentally came across this when I tried to get the result one would get with

tibble_2 <- tibble(d = 4:13) %>%
    mutate(tibble_one = pmap(tibble_1, c))

So my question is: what's the purpose behind this? Can it be used for something useful?

Thanks in advance.

Seems like an off-label use that only doesn't error by 'accident'.
Here's a story about recent off-label uses in the ggplot2 part of the tidyverse
Off-label uses in ggplot2 (tidyverse.org)

If so, would it be wise to report this as an issue on their GitHub? I'll leave this thread open for a bit before doing so if that's the case, in case someone drops in with an unexpected use case .

Tibble is another term for the S3 tbl_df class.
The tbl_df class is a special type of DataFrame.
You can think Tibble as a different way to create dataframes.
There are 2 main differences:

  1. Tibble never changes an input type.
  2. They are excelent for Lazy Evaluation, since it only recycles vectors of length 1, the evaluation strategy can detect bugs more easily.
    You can learn more info below (both sources are from the R project):
    https://tibble.tidyverse.org/
    Tibbles
    Hope this helps

Thanks for your reply, but my question wasn't about what a tibble is. I know what they are and am quite fond of them, but acccidentally came across this 'feature' and wanted to learn if it's by design and whether there are use cases for it.

I am not sure but you may have run into a nested tibble. Nest and unnest — nest • tidyr

I ran into them once but never really figured out what they did.

The type of nested tibbles they speak of would be a list column of tibbles (i.e. one tibble per row). An example:

> tibble_3
# A tibble: 10 x 3
       a     b     c
   <int> <int> <dbl>
 1     1     1     2
 2     1     6     7
 3     2     2     3
 4     2     7     8
 5     3     3     4
 6     3     8     9
 7     4     4     5
 8     4     9    10
 9     5     5     6
10     5    10    11
> tibble_3 %>%
+   group_by(a) %>%
+   nest()
# A tibble: 5 x 2
# Groups:   a [5]
      a data            
  <int> <list>          
1     1 <tibble [2 x 2]>
2     2 <tibble [2 x 2]>
3     3 <tibble [2 x 2]>
4     4 <tibble [2 x 2]>
5     5 <tibble [2 x 2]>

As you can see the resulting structure is rather different from the one I accidentally encountered.

Ah yes, I see what you mean. I go with the off-label explanation then. Something not supposed to exist but does.

I have reposted this on the tibble GitHub page: Possible off-label use of "column-nesting" tibbles · Issue #899 · tidyverse/tibble · GitHub

The name you are searching for is df-column. This is a column of a data frame that is itself another data frame. This is different from a list-column, but we have named it with the same conventions.

They have limited uses, and support for them has only been added relatively recently to the tidyverse, but there are places where they come up, and there are a few tools to handle them.

In tidyr, unpack() takes a df-col and effectively flattens the nesting structure. pack() goes the other way, and is another way to construct a df-col.

library(tibble)
library(tidyr)

tibble1 <- tibble(a = 1:5, b = 2:6, c = 3:7)
tibble2 <- tibble(d = 4:8, col = tibble1)

# `col` is a df-column, it is a column that is itself a data frame
tibble2
#> # A tibble: 5 x 2
#>       d col$a    $b    $c
#>   <int> <int> <int> <int>
#> 1     4     1     2     3
#> 2     5     2     3     4
#> 3     6     3     4     5
#> 4     7     4     5     6
#> 5     8     5     6     7

# Extract it by name to get access to the "column", but the column itself is a data frame!
tibble2$col
#> # A tibble: 5 x 3
#>       a     b     c
#>   <int> <int> <int>
#> 1     1     2     3
#> 2     2     3     4
#> 3     3     4     5
#> 4     4     5     6
#> 5     5     6     7

# "unpack" that df-column into its individual columns
unpacked <- unpack(tibble2, col)
unpacked
#> # A tibble: 5 x 4
#>       d     a     b     c
#>   <int> <int> <int> <int>
#> 1     4     1     2     3
#> 2     5     2     3     4
#> 3     6     3     4     5
#> 4     7     4     5     6
#> 5     8     5     6     7

# Pack 3 columns into a new df-column
pack(unpacked, col = a:c)
#> # A tibble: 5 x 2
#>       d col$a    $b    $c
#>   <int> <int> <int> <int>
#> 1     4     1     2     3
#> 2     5     2     3     4
#> 3     6     3     4     5
#> 4     7     4     5     6
#> 5     8     5     6     7

You might be surprised to learn that the more popular unnest() function is actually implemented using unpack().

tidyr:::unnest.data.frame
#> function (data, cols, ..., keep_empty = FALSE, ptype = NULL, 
#>     names_sep = NULL, names_repair = "check_unique", .drop = "DEPRECATED", 
#>     .id = "DEPRECATED", .sep = "DEPRECATED", .preserve = "DEPRECATED") 
#> {
#>     cols <- tidyselect::eval_select(enquo(cols), data)
#>     if (nrow(data) == 0) {
#>         for (col in names(cols)) {
#>             data[[col]] <- as_empty_df(data[[col]], col = col)
#>         }
#>     }
#>     else {
#>         for (col in names(cols)) {
#>             data[[col]] <- map(data[[col]], as_df, col = col)
#>         }
#>     }
#>     data <- unchop(data, any_of(cols), keep_empty = keep_empty, 
#>         ptype = ptype)
#>     unpack(data, any_of(cols), names_sep = names_sep, names_repair = names_repair)
#> }
#> <bytecode: 0x7fca9edb3c80>
#> <environment: namespace:tidyr>

In dplyr, the new across() function actually returns a data frame, and if you name the result in mutate() then you can create a df-col, even though it is normally used without naming the result.

library(dplyr)

tbl <- tibble(x = 1:5, y = 6:10)

# `across()` returns a data frame, which is "packed" into a df-col called `col`
tbl_packed <- mutate(tbl, col = across(x:y, ~.x - 1L, .names = "{.col}_minus_one"))
tbl_packed
#> # A tibble: 5 x 3
#>       x     y col$x_minus_one $y_minus_one
#>   <int> <int>           <int>        <int>
#> 1     1     6               0            5
#> 2     2     7               1            6
#> 3     3     8               2            7
#> 4     4     9               3            8
#> 5     5    10               4            9

# Effectively the same as:
mutate(tbl, col = tibble(x_minus_one = tbl$x - 1L, y_minus_one = tbl$y - 1L))
#> # A tibble: 5 x 3
#>       x     y col$x_minus_one $y_minus_one
#>   <int> <int>           <int>        <int>
#> 1     1     6               0            5
#> 2     2     7               1            6
#> 3     3     8               2            7
#> 4     4     9               3            8
#> 5     5    10               4            9

# The more popular way to use `across()` is to not name the result.
# This causes the data frame that `across()` returns to be "auto-unpacked"
# (i.e. it automatically does what `tidyr::unpack()` would do)
mutate(tbl, across(x:y, ~.x - 1L, .names = "{.col}_minus_one"))
#> # A tibble: 5 x 4
#>       x     y x_minus_one y_minus_one
#>   <int> <int>       <int>       <int>
#> 1     1     6           0           5
#> 2     2     7           1           6
#> 3     3     8           2           7
#> 4     4     9           3           8
#> 5     5    10           4           9

# Compare against:
tidyr::unpack(tbl_packed, col)
#> # A tibble: 5 x 4
#>       x     y x_minus_one y_minus_one
#>   <int> <int>       <int>       <int>
#> 1     1     6           0           5
#> 2     2     7           1           6
#> 3     3     8           2           7
#> 4     4     9           3           8
#> 5     5    10           4           9

So we've been finding some interesting places to use these ideas, and they are definitely not considered off-label usage!

1 Like

Thank you for the detailed response! I'm always happy to learn of more features. I'll be thinking carefully on how to utilize this.

I'll also close my GitHub post :).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.