I have been using read_csv for reading tables and has worked like a charm so far. The column type inference is very nice, since it saves from having to do type conversions.
Except, for when it behaves a bit too naïve. Here is an example where some of date columns are misinferred as dbl
:
Warning message:
“One or more parsing issues, call `problems()` on your data frame for details, e.g.:
dat <- vroom(...)
problems(dat)”
Rows: 15052 Columns: 286
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ";"
dbl (285): min_date, max_date, ...
date (1): disease_start_date
I know what is triggering this: Columns like min_date
and max_date
have NA rows. Only disease_start_date
is complete enough to infer the pattern.
Is there anything I can do to make the type inference a bit smarter?
Thanks!