when and where to use the column specifications information comes from "See spec(...) for full column specifications" ?

readr module provides a set of useful tools to import data.

each time we use one of them, this message "See spec(...) for full column specifications" shows up.

then spec(df) (or spec any other variable stored ) gives full columns.

the question is, when and where to use this information?

You can use it, for example, when you have multiple files with the same structure. If there is no spec, then readr will try to guess the data types. It can take a long time and overall it's a waste of time if you already have a spec.

Another use-case is to take this spec and update it to suit your needs. Every once in a while it is useful if readr can't guess the data type correctly (e.g., with datetimes since it's a mess). You then have a spec that you can update very easily to handle this one specific column.

There are probably other use-cases, those two are the ones I had come across in my work.

You use it as input to col_types. The main reason to use a specification is safety.

If your data does not have the types you expect it to or if the data changes unexpectedly in the future the code will fail early on with an informative error rather than seeming to work fine and breaking further on.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.