New to R, need help with parsing problems when dumping data

siobhang · July 20, 2021, 4:14pm

I have been going through the tutorial for geonames (Mapping and Monitoring with Geonames)

When I run the part of the code to download the zipfile from the site:

library(readr)
temp <- tempfile()
download.file("http://download.geonames.org/export/dump/KE.zip", temp)
KE <- unz(temp, "KE.txt")
kenya_geodump <- read_tsv(KE, col_names = FALSE)

I get this error:

Warning: 7 parsing failures.
row col expected actual file
11910 X14 1/0/T/F/TRUE/FALSE 189861
23320 X14 1/0/T/F/TRUE/FALSE 8658052
23767 X14 1/0/T/F/TRUE/FALSE 7867515
23773 X14 1/0/T/F/TRUE/FALSE 7867522
23786 X14 1/0/T/F/TRUE/FALSE 7867542
..... ... .................. ....... ............
See problems(...) for more details.

I can't figure out what to do next.

nirgrahamuk · July 20, 2021, 4:34pm

read_tsv will read some number of lines to guess the column types, it will then read the rest of the lines, and will tell you about problems where new data is not compatible with the rows used for guessing.
In this case X14 seemed to be logical (really it was all NA's until the first value, which is 189861 i.e. a number which appears on row 11910.

This implies that if read_tsv involved row 11910 in its column type guesstimation, it would guess better the types needed.
This seems to be correct

kenya_geodump <- read_tsv(KE, col_names = FALSE,
                              guess_max = 11910)

siobhang · July 20, 2021, 4:47pm

Thank you! That worked.

Thanks as well for the explanation.

system · July 27, 2021, 4:47pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.