Issue in core R, utils

Wanted to ask the community if the error I've come across is an error or expected behaviour.

Using count.feilds in the utils core package.

lines <- c(
  "one, \"Sentence on 
  
  a few lines\"",
  "'three four'")
writeLines(lines, "test.txt")

count.fields("test.txt", sep=",",quote="\"")

Expected outcome: [1] 2 1

Actual outcome: [1] NA NA 2 1

The package follows return/newlines in the define quotes, where the correct behaviour should be to ignore the characters in the quotes.

I believe this is counter intuitive and goes against other packages such as read_csv.

So from reading the documentation, this sounds like expected behavior:

Consistent with scan, count.fields allows quoted strings to contain newline characters. In such a case the starting line will have the field count recorded as NA, and the ending line will include the count of all fields from the beginning of the record.

So line 1 is your starting line, recorded as NA, line 2 is still not the end, so I guess it gets NA as well. Then line 3 terminates the multi-line quoted string, so it gets the total field count of the last 3 lines: 2.

I'm not sure about how this relates to read.csv. Since read_csv is a Tidyverse function though, I think we should not expect it to behave like base or utils.

1 Like

You could throw an na.omit() to strip them out and get your desired output.

Thanks.

Why read_csv link? My issue related to the following example.

Assume a csv file with 4 columns.

If I have a csv that is failing to import correctly, first step would be to check its formatted correctly or malformed. I had assumed you could run a count.fields, then filter on the result to show which rows != 3 (as all rows should have 3x ",")

But in this example, if one column has quoted text, that includes return/new line characters, this approach wouldn't work - which in my view is counter intuitive.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.