Skip columns and specify column types in readr without warnings

nacnudus · February 28, 2019, 8:15pm

Is it possible to specify col_types for a subset of columns and only import those columns, without getting warnings? I don't know in advance how many columns there will be, so I can't name them.

I have tried the following:

library(readr)

csv <- "foo,bar,baz\n1,2,3"

read_csv(csv, col_types = "nn")
#> Warning: Unnamed `col_types` should have the same length as `col_names`.
#> Using smaller of the two.
#> Warning: 1 parsing failure.
#> row col  expected    actual         file
#>   1  -- 2 columns 3 columns literal data
#> # A tibble: 1 x 2
#>     foo   bar
#>   <dbl> <dbl>
#> 1     1     2

read_csv(csv,
         col_types = cols(col_number(), col_number(), .default = col_skip()))
#> Warning: Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.

#> Warning: 1 parsing failure.
#> row col  expected    actual         file
#>   1  -- 2 columns 3 columns literal data
#> # A tibble: 1 x 2
#>     foo   bar
#>   <dbl> <dbl>
#> 1     1     2

read_csv(csv,
         skip = 1,
         col_names = c("one", "two"),
         col_types = cols(col_number(), col_number(), .default = col_skip()))
#> Warning: 1 parsing failure.
#> row col  expected    actual         file
#>   1  -- 2 columns 3 columns literal data
#> # A tibble: 1 x 2
#>     one   two
#>   <dbl> <dbl>
#> 1     1     2

Created on 2019-02-28 by the reprex package (v0.2.0.9000).

Session info

devtools::session_info()
#> ─ Session info ──────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.5.2 (2018-12-20)
#>  os       Arch Linux                  
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language                             
#>  collate  en_NZ.UTF-8                 
#>  ctype    en_GB.UTF-8                 
#>  tz       Europe/London               
#>  date     2019-02-28                  
#> 
#> ─ Packages ──────────────────────────────────────────────────────────────
#>  package     * version    date       lib source                           
#>  assertthat    0.2.0      2017-04-11 [1] CRAN (R 3.5.0)                   
#>  backports     1.1.3      2018-12-14 [1] CRAN (R 3.5.2)                   
#>  callr         3.1.1      2018-12-21 [1] CRAN (R 3.5.2)                   
#>  cli           1.0.1      2018-09-25 [1] CRAN (R 3.5.1)                   
#>  crayon        1.3.4      2017-09-16 [1] CRAN (R 3.5.0)                   
#>  desc          1.2.0      2018-05-01 [1] CRAN (R 3.5.0)                   
#>  devtools      2.0.1.9000 2019-01-28 [1] Github (r-lib/devtools@e4e57aa)  
#>  digest        0.6.18     2018-10-10 [1] CRAN (R 3.5.1)                   
#>  evaluate      0.12       2018-10-09 [1] CRAN (R 3.5.1)                   
#>  fansi         0.4.0      2018-11-09 [1] Github (brodieG/fansi@ab11e9c)   
#>  fs            1.2.6      2018-08-23 [1] CRAN (R 3.5.2)                   
#>  glue          1.3.0.9000 2019-01-28 [1] Github (tidyverse/glue@8188cea)  
#>  highr         0.7        2018-06-09 [1] CRAN (R 3.5.1)                   
#>  hms           0.4.2.9001 2019-02-28 [1] Github (tidyverse/hms@16ff76e)   
#>  htmltools     0.3.6      2017-04-28 [1] CRAN (R 3.5.0)                   
#>  knitr         1.21       2018-12-10 [1] CRAN (R 3.5.1)                   
#>  magrittr      1.5        2014-11-22 [1] CRAN (R 3.5.0)                   
#>  memoise       1.1.0      2017-04-21 [1] CRAN (R 3.5.0)                   
#>  nvimcom     * 0.9-75     2019-01-03 [1] local                            
#>  pillar        1.3.1.9000 2019-01-23 [1] Github (r-lib/pillar@3a54b8d)    
#>  pkgbuild      1.0.2      2018-10-16 [1] CRAN (R 3.5.1)                   
#>  pkgconfig     2.0.2      2018-08-16 [1] CRAN (R 3.5.1)                   
#>  pkgload       1.0.2      2018-10-29 [1] CRAN (R 3.5.1)                   
#>  prettyunits   1.0.2      2015-07-13 [1] CRAN (R 3.5.0)                   
#>  processx      3.2.1      2018-12-05 [1] CRAN (R 3.5.1)                   
#>  ps            1.3.0      2018-12-21 [1] CRAN (R 3.5.2)                   
#>  R6            2.4.0      2019-02-14 [1] CRAN (R 3.5.2)                   
#>  Rcpp          1.0.0      2018-11-07 [1] CRAN (R 3.5.2)                   
#>  readr       * 1.3.1.9000 2019-02-28 [1] Github (tidyverse/readr@b7e0b99) 
#>  remotes       2.0.2      2018-10-30 [1] CRAN (R 3.5.2)                   
#>  rlang         0.3.1      2019-01-08 [1] CRAN (R 3.5.2)                   
#>  rmarkdown     1.11       2018-12-08 [1] CRAN (R 3.5.1)                   
#>  rprojroot     1.3-2      2018-01-03 [1] CRAN (R 3.5.0)                   
#>  sessioninfo   1.1.1      2018-11-05 [1] CRAN (R 3.5.1)                   
#>  stringi       1.3.1      2019-02-13 [1] CRAN (R 3.5.2)                   
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 3.5.2)                   
#>  testthat      2.0.1      2018-10-13 [1] CRAN (R 3.5.2)                   
#>  tibble        2.0.1.9001 2019-02-28 [1] Github (tidyverse/tibble@92f5604)
#>  usethis       1.4.0      2018-08-14 [1] CRAN (R 3.5.1)                   
#>  utf8          1.1.4      2018-05-24 [1] CRAN (R 3.5.0)                   
#>  withr         2.1.2      2018-03-15 [1] CRAN (R 3.5.0)                   
#>  xfun          0.4        2018-10-23 [1] CRAN (R 3.5.1)                   
#>  yaml          2.2.0      2018-07-25 [1] CRAN (R 3.5.1)                   
#> 
#> [1] /home/nacnudus/R/x86_64-pc-linux-gnu-library/3.5
#> [2] /usr/lib/R/library

jdlong · February 28, 2019, 8:27pm

If you're getting the right functionality but just don't like the warnings, you could wrap your call in suppressWarnings()

jimhester · March 1, 2019, 9:13pm

I think ideally we might relax this constraint and let you provide a list of unnamed column types in the future.

However maybe there is a different way to tackle the problem. How are you determining how many columns you want to read if you don't know how many columns there are?

nacnudus · March 2, 2019, 10:39am

The order of existing columns is guaranteed, but new columns are sometimes added at the end, so I know which ones I need, but I don't know how many dummy column names to create. For now I create dummy names for columns that will be skipped anyway, e.g.

read_csv(csv,
         skip = 1,
         col_names = c("one", "two", paste0("dummy", seq_len(no_of_cols - 2)),
         col_types = cols(col_number(), col_number(), .default = col_skip()))

system · March 23, 2019, 10:39am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.