So, will just one column be a list column, or will every (or almost every) column be a list column? The first case would be more likely to save memory. I would try it out with a subset of your data, and check the results using object.size
or pryr::object_size
.
As a toy example, using a list with one numeric column A
and one numeric list column B
with a 100 entries per A
row, unnesting increases the data size by about 50%:
suppressPackageStartupMessages(library(tidyverse))
df_list <- tibble(A = 1:1000L,
B = rerun(1000, rnorm(100)))
df_list
#> # A tibble: 1,000 x 2
#> A B
#> <int> <list>
#> 1 1 <dbl [100]>
#> 2 2 <dbl [100]>
#> 3 3 <dbl [100]>
#> 4 4 <dbl [100]>
#> 5 5 <dbl [100]>
#> 6 6 <dbl [100]>
#> 7 7 <dbl [100]>
#> 8 8 <dbl [100]>
#> 9 9 <dbl [100]>
#> 10 10 <dbl [100]>
#> # ... with 990 more rows
object.size(df_list)
#> 852896 bytes
df_long <- unnest(df_list)
df_long
#> # A tibble: 100,000 x 2
#> A B
#> <int> <dbl>
#> 1 1 1.7545358
#> 2 1 -0.2732362
#> 3 1 0.9484000
#> 4 1 -0.8999221
#> 5 1 1.3951232
#> 6 1 0.9915580
#> 7 1 -0.3650540
#> 8 1 -0.1489101
#> 9 1 1.4596137
#> 10 1 1.5815404
#> # ... with 99,990 more rows
object.size(df_long)
#> 1200896 bytes
So, it might help, but as I said, you should test it out to see if it's worthwhile in your case.