Nested <tibble> (as opposed to a <list>-col of tibbles)

I'm curious about the potential pitfalls between a nested tibble in a tibble vs a nested list of tibbles within a tibble.
tibble seems to not want the former case, as seen here:

 > foo <- mtcars
 > foo$foo <- foo
 > str(foo)
 'data.frame':   32 obs. of  12 variables:
  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
  $ disp: num  160 160 108 258 360 ...
  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
  $ qsec: num  16.5 17 18.6 19.4 17 ...
  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
  $ foo :'data.frame':   32 obs. of  11 variables:
   ..$ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
   ..$ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
   ..$ disp: num  160 160 108 258 360 ...
   ..$ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
   ..$ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
   ..$ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
   ..$ qsec: num  16.5 17 18.6 19.4 17 ...
   ..$ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
   ..$ am  : num  1 1 1 0 0 0 0 0 0 0 ...
   ..$ gear: num  4 4 4 3 3 3 3 4 4 4 ...
   ..$ carb: num  4 4 1 1 2 1 4 2 2 4 ...
 > as_tibble(foo)
 Error: Column `foo` must be a 1d atomic vector or a list

But, this is doable:

 > foo <- as_tibble(mtcars)
 > foo$foo <- foo
 > foo
 # A tibble: 32 x 12
      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb foo
  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <tibble>
  1  21.0  6.00   160 110    3.90  2.62  16.5  0     1.00  4.00  4.00 c(21, 21, 22.8, 21.4, 18.7, 18.…
  2  21.0  6.00   160 110    3.90  2.88  17.0  0     1.00  4.00  4.00 c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6,…
  3  22.8  4.00   108  93.0  3.85  2.32  18.6  1.00  1.00  4.00  1.00 c(160, 160, 108, 258, 360, 225,…
  4  21.4  6.00   258 110    3.08  3.22  19.4  1.00  0     3.00  1.00 c(110, 110, 93, 110, 175, 105, …
  5  18.7  8.00   360 175    3.15  3.44  17.0  0     0     3.00  2.00 c(3.9, 3.9, 3.85, 3.08, 3.15, 2…
  6  18.1  6.00   225 105    2.76  3.46  20.2  1.00  0     3.00  1.00 c(2.62, 2.875, 2.32, 3.215, 3.4…
  7  14.3  8.00   360 245    3.21  3.57  15.8  0     0     3.00  4.00 c(16.46, 17.02, 18.61, 19.44, 1…
  8  24.4  4.00   147  62.0  3.69  3.19  20.0  1.00  0     4.00  2.00 c(0, 0, 1, 1, 0, 1, 0, 1, 1, 1,…
  9  22.8  4.00   141  95.0  3.92  3.15  22.9  1.00  0     4.00  2.00 c(1, 1, 1, 0, 0, 0, 0, 0, 0, 0,…
 10  19.2  6.00   168 123    3.92  3.44  18.3  1.00  0     4.00  4.00 c(4, 4, 4, 3, 3, 3, 3, 4, 4, 4,…
 # ... with 22 more rows

And the nested foo column is perfectly preserved as a tibble:

 > foo$foo
 # A tibble: 32 x 11
      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  1  21.0  6.00   160 110    3.90  2.62  16.5  0     1.00  4.00  4.00
  2  21.0  6.00   160 110    3.90  2.88  17.0  0     1.00  4.00  4.00
  3  22.8  4.00   108  93.0  3.85  2.32  18.6  1.00  1.00  4.00  1.00
  4  21.4  6.00   258 110    3.08  3.22  19.4  1.00  0     3.00  1.00
  5  18.7  8.00   360 175    3.15  3.44  17.0  0     0     3.00  2.00
  6  18.1  6.00   225 105    2.76  3.46  20.2  1.00  0     3.00  1.00
  7  14.3  8.00   360 245    3.21  3.57  15.8  0     0     3.00  4.00
  8  24.4  4.00   147  62.0  3.69  3.19  20.0  1.00  0     4.00  2.00
  9  22.8  4.00   141  95.0  3.92  3.15  22.9  1.00  0     4.00  2.00
 10  19.2  6.00   168 123    3.92  3.44  18.3  1.00  0     4.00  4.00
 # ... with 22 more rows

Q: Why would I ever do this in the first place?
A: Really just for namespace management when dealing with some external APIs.

I realize for this to work the nested tibbles must have the same number of rows as the surrounding tibble, but this can be checked (and indeed it appears to be as I get an error if I replace the above assignment with foo$foo <- head(foo)).

The 'traditional' approach of nesting data frames into a list-col can work here, too, but then I just have a list of single-row data frames, which seems a bit silly.

Flattening (e.g. as done with jsonlite's flatten option) also works, but when the fields' names are outside our control, figuring out a non-conflicting new naming scheme for them can be annoying.

So, is this pattern highly discouraged in some way?
It seems much of tibble's tooling discourages it, but it's not clear to me exactly why.

1 Like