There exists a number of data submitters who have various levels of technical skill. In order to accommodate them all, accepting a csv with some number of extraneous columns is permitted. This is specifically an issue with some exports from Excel that have a couple dozen empty columns tacked on.
The order and meaning of the first columns, which are the ones to be read, are well enforced. There's some variation in the column names, but at least the order and meaning is constant. The rest is a laissez-faire content ranging from blank columns to irrelevant data.
Reading in a csv with extraneous columns results in warning message. This is undesirable as it generates some useless warnings, and those warnings emitted are logged.
Is there a way to specify just read the first few columns and assign them their types without generating the warnings about expected vs actual, and col_types should be the same length as col_names?
Warning: 2 parsing failures. row col expected actual file 1 -- 3 columns 5 columns literal data 2 -- 3 columns 5 columns literal data
The interim solution has been to suppress warnings, but this is suboptimal as it will also suppress any actual warnings of interest such as "no trailing characters .0" when some flat file editor or settings
helpfully determine that the user really wanted floating point representation instead of integers.
library(readr) # Input data with extra unwanted bits. extra_blanks <- "foo,bar,baz,,,\naaa,100,ccc,,,\naaa,200.0,ccc,,,\n" extra_garbage <- "foo,bar,baz,xtra,xtra\naaa,100,ccc,xtra,xtra\naaa,200,ccc,xtra,xtra\n" # Read input data with extraneous bits using column specification col_spec <- cols(col_character(), col_integer(), col_character(), .default = col_skip()) readr::read_csv(extra_blanks, col_types = col_spec) readr::read_csv(extra_garbage, col_types = col_spec) # Read input data with extraneous bits using cols_only col_spec <- cols_only(col_character(), col_integer(), col_character()) readr::read_csv(extra_blanks, col_types = col_spec) readr::read_csv(extra_garbage, col_types = col_spec) # A tibble: 2 x 3 # foo bar baz # <chr> <int> <chr> # 1 aaa 100 ccc # 2 aaa 200 ccc