reading/writing wide data frames

I've been using readr for reading/writing data frames. Obviously, as you add more data, the process is slower. However, it seems that adding columns is more problematic than adding rows. For example, a 100k x 100 table is noticeably faster to read/write than a 100 x 100k table. I expected additional columns to add extra overhead, but not to such a degree.

There are a lot of benchmarks for reading/writing text files, but they focus on long tables. Are there any tips or suggestions for wide tables?

have you tried data.table fread/fwrite? it quite a bit faster than readr, but i'm not sure how it handles wide tables

2 Likes

I haven't tried yet. I didn't want to start testing different options if there is already a known solution. I definitely will if nothing comes up.

tidyverse/dplyr is fairly slow on wide-tables ( http://www.win-vector.com/blog/2018/02/is-10000-cells-big/ ), I would definitely try data.table instead (which is a well known first-class package).

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.