readr 2.0.0 (now vroom) doesn't seem to recognise files with lines delimited by <CR> instead of <CR><LF>

library(tidyverse)

# this fails with readr 2.0.0
# make a string of 0123456789<CR> and copy it 13108 times
write.string = intToUtf8(rep(c(seq(48,57),c(13)), 13108))
write.filename = file("test.txt", "wb")
writeBin(write.string, write.filename)
close(write.filename)

# this generates a vroom error
df.test <- readr::read_csv('test.txt',  col_types = cols(.default = "c"))

# but this works
# make a string of 0123456789<CR><LF> and copy it 13108 times
write.string = intToUtf8(rep(c(seq(48,57),c(13,10)), 13108))

message is :

Error: The size of the connection buffer (131072) was not large enough
to fit a complete line:

  • Increase it by setting Sys.setenv("VROOM_CONNECTION_SIZE")

This was mentioned in the release notes for readr 2.0.0

  • Normalizing newlines in files with just carriage returns \r is no longer supported. The last major OS to use only CR as the newline was 'classic' Mac OS, which had its final release in 2001.

Unfortunately despite this convention having not been used in any major OS in over two decades, Microsoft Excel for macOS still outputs CSVs in this style. So we will likely have to bring back support for it.

For now you can use the first edition to read the file

write.string = intToUtf8(rep(c(seq(48,57),c(13)), 10))
write.filename = file("test.txt", "wb")
writeBin(write.string, write.filename)
close(write.filename)

library(readr)
with_edition(1, readr::read_csv("test.txt"))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   `0123456789` = col_character()
#> )
#> Warning: 1 parsing failure.
#> row        col expected        actual       file
#>  10 0123456789          embedded null 'test.txt'
#> # A tibble: 10 × 1
#>    `0123456789`
#>    <chr>       
#>  1 "0123456789"
#>  2 "0123456789"
#>  3 "0123456789"
#>  4 "0123456789"
#>  5 "0123456789"
#>  6 "0123456789"
#>  7 "0123456789"
#>  8 "0123456789"
#>  9 "0123456789"
#> 10 ""

Created on 2021-08-24 by the reprex package (v2.0.0)

1 Like

Many Thanks, my apologies I missed that bit in the release notes.

A colleague installed a new version of tidyverse and an old file wouldn't read all of a sudden for her but would for me, so it's taken a while to track down the cause and I probably skipped a bit more reading than I should have. On the bright side it led me to discover the hexView package which is just brilliant.

Thanks for the workaround, appreciate that.

Suzanne

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.