What is best method to save and read data.frames with list-column


#1

write_csv() throws the error
"Don’t know how to handle vector of type list"
and write.csv() has similar result

I’m also having problems with save/load - which TBH I rarely if ever use

Is there a recommended practice? I’m only talking small datasets


#2

The CSV format doesn’t have a standard way of handling the third “dimension” represented by the list column.

If you want human (semi-)interpretability, I would suggest dput/dget, which should be able to serialize any R data structure in format that you could either read with dget or copy/paste into an assignment statement. Alternatively, saveRDS/readRDS would use a binary format.

Finally, if you’re really set on a CSV file, you could do something like:

suppressPackageStartupMessages(library(tidyverse))

tib <- tibble(norm_col = 1:3,
             list_col = list(1:5, 6:10, 11:15))

tib %>% mutate(list_col = map_chr(list_col, ~ capture.output(dput(.)))) %>%
  write_csv("output.csv")

tib_saved <- read_csv("output.csv") %>%
  mutate(list_col = map(list_col, . %>% (rlang::parse_expr)() %>% eval()))
#> Parsed with column specification:
#> cols(
#>   norm_col = col_integer(),
#>   list_col = col_character()
#> )

all_equal(tib %>% unnest(),
          tib_saved %>% unnest())
#> [1] TRUE

I’m sure I’m making that more convoluted than necessary, but the point is, you can force anything into a CSV if you try hard enough. You might have trouble reading it without extensive notes, though. :wink:


#3

Thanks for swift reply.
Just tried the dput/dget option and it works well on a toy example


#4

I would highly recommend to use saveRDS/readRDS over dput if there are no really strong reasons to use dput.


#5

You are correct. i forgot about that option and for a real case scenario that gave me identical data.frames when I read the saved one back wheras the the dget one did not


#6

dput is useful for generating test data for code examples, but not really for storing data