Read text file using read_csv from Tidyverse

readr

#1

I’m trying to read the file inserted below into a nibble, but this doesn’t work.

# BRON: KONINKLIJK NEDERLANDS METEOROLOGISCH INSTITUUT (KNMI)
# Opmerking: door stationsverplaatsingen en veranderingen in waarneemmethodieken zijn deze tijdreeksen van dagwaarden mogelijk inhomogeen! Dat betekent dat deze reeks van gemeten waarden niet geschikt is voor trendanalyse. Voor studies naar klimaatverandering verwijzen we naar de gehomogeniseerde reeks maandtemperaturen van De Bilt <http://www.knmi.nl/kennis-en-datacentrum/achtergrond/gehomogeniseerde-reeks-maandtemperaturen-de-bilt> of de Centraal Nederland Temperatuur <http://www.knmi.nl/kennis-en-datacentrum/achtergrond/centraal-nederland-temperatuur-cnt>.
# 
# 
# STN      LON(east)   LAT(north)     ALT(m)  NAME
# 209:         4.518       52.465       0.00  IJMOND
# 210:         4.430       52.171      -0.20  VALKENBURG
# 215:         4.437       52.141      -1.10  VOORSCHOTEN
# 225:         4.555       52.463       4.40  IJMUIDEN
# 235:         4.781       52.928       1.20  DE KOOY
# 240:         4.790       52.318      -3.30  SCHIPHOL
# 242:         4.921       53.241      10.80  VLIELAND
# 248:         5.174       52.634       0.80  WIJDENES
# 249:         4.979       52.644      -2.40  BERKHOUT
# 251:         5.346       53.392       0.70  HOORN (TERSCHELLING)
# 257:         4.603       52.506       8.50  WIJK AAN ZEE
# 258:         5.401       52.649       7.30  HOUTRIBDIJK
# 260:         5.180       52.100       1.90  DE BILT
# 265:         5.274       52.130      13.90  SOESTERBERG
# 267:         5.384       52.898      -1.30  STAVOREN
# 269:         5.520       52.458      -3.70  LELYSTAD
# 270:         5.752       53.224       1.20  LEEUWARDEN
# 273:         5.888       52.703      -3.30  MARKNESSE
# 275:         5.873       52.056      48.20  DEELEN
# 277:         6.200       53.413       2.90  LAUWERSOOG
# 278:         6.259       52.435       3.60  HEINO
# 279:         6.574       52.750      15.80  HOOGEVEEN
# 280:         6.585       53.125       5.20  EELDE
# 283:         6.657       52.069      29.10  HUPSEL
# 285:         6.399       53.575       0.00  HUIBERTGAT
# 286:         7.150       53.196      -0.20  NIEUW BEERTA
# 290:         6.891       52.274      34.80  TWENTHE
# 308:         3.379       51.381       0.00  CADZAND
# 310:         3.596       51.442       8.00  VLISSINGEN
# 311:         3.672       51.379       0.00  HOOFDPLAAT
# 312:         3.622       51.768       0.00  OOSTERSCHELDE
# 313:         3.242       51.505       0.00  VLAKTE V.D. RAAN
# 315:         3.998       51.447       0.00  HANSWEERT
# 316:         3.694       51.657       0.00  SCHAAR
# 319:         3.861       51.226       1.70  WESTDORPE
# 323:         3.884       51.527       1.40  WILHELMINADORP
# 324:         4.006       51.596       0.00  STAVENISSE
# 330:         4.122       51.992      11.90  HOEK VAN HOLLAND
# 331:         4.193       51.480       0.00  THOLEN
# 340:         4.342       51.449      19.20  WOENSDRECHT
# 343:         4.313       51.893       3.50  R'DAM-GEULHAVEN
# 344:         4.447       51.962      -4.30  ROTTERDAM
# 348:         4.926       51.970      -0.70  CABAUW
# 350:         4.936       51.566      14.90  GILZE-RIJEN
# 356:         5.146       51.859       0.70  HERWIJNEN
# 370:         5.377       51.451      22.60  EINDHOVEN
# 375:         5.707       51.659      22.00  VOLKEL
# 377:         5.763       51.198      30.00  ELL
# 380:         5.762       50.906     114.30  MAASTRICHT
# 391:         6.197       51.498      19.50  ARCEN
# 

Anyone knows how to make this work?

Thanks,

Fritsander


#2

It’s really hard to read with the current formatting. It looks like you have a file that isn’t actually comma-separated, but, again, it’s hard to tell from what you’ve posted.

I don’t know if you’ve taken a look at the reprex package yet, but it essentially helps you make an example of the problem you’re having so that someone else can run it themselves to help you troubleshoot. It’s pretty awesome, and definitely worth getting acquainted with for asking questions on here.

Nick Tierney wrote a great post with gifs about it on his blog: http://www.njtierney.com/post/2017/01/11/magic-reprex/


#3

Hi @fritsander, what did you try exactly ? Have you a snippet of code that you try ?
using reprex like @mara suggests is even better !

If your file is not already available online, you can use a gist to put your file online, or another service like dropbox or google drive with a public link.

All this could help us help you ! :wink:


#4

If the data as it came through in visually in the post is actually representative of the file, so it is space delimited but the place names can include spaces which are not delimited and there is no special escaping of the entries where the spaces are part of the text:

read it in with readLines() to get a vector of lines, subset the vector to the part with the data, then split them so the first 4 space delimited entries form separate variables and everything else is part of the last variable


#5

Hi Fritsander,

So I’m assuming your data is from this webpage located here

http://projects.knmi.nl/klimatologie/uurgegevens/getdata_uur.cgi

After looking at the data the structure seems to look like a fixed-width file:

# STN      LON(east)   LAT(north)     ALT(m)  NAME
# 391:         6.197       51.498      19.50  ARCEN
# 370:         5.377       51.451      22.60  EINDHOVEN
# 331:         4.193       51.480       0.00  THOLEN
# 315:         3.998       51.447       0.00  HANSWEERT
# 324:         4.006       51.596       0.00  STAVENISSE
# 375:         5.707       51.659      22.00  VOLKEL
# 380:         5.762       50.906     114.30  MAASTRICHT
# 240:         4.790       52.318      -3.30  SCHIPHOL
# 286:         7.150       53.196      -0.20  NIEUW BEERTA

I myself, am struggling to parse this, so if anyone can help…

path <- "http://projects.knmi.nl/klimatologie/uurgegevens/getdata_uur.cgi"
#use stringr to find out how long each column should be
col_a <- stringr::str_count('# 391:')
col_b <- stringr::str_count('# 391:         6.197')
col_c <- stringr::str_count('# 370:         5.377       51.451')
col_d <- stringr::str_count('# 380:         5.762       50.906     ')

#print out values
col_a
col_b
col_c
col_d
data <- readr::read_fwf(file = path,fwf_widths(c(6,20,33,38), c("STN","LON(east)","LAT(north)","ALT(m)_NAME")), skip = 5, n_max = 50)

#6

Well done with determining a cleaner data source! readr makes fixed width files much easier. I.e. it provides lots of helpers to think through the data in different ways. See ?read_fwf for more detail (albeit less verbiage on each of the fwf_* functions than I would optimally have expected). fwf_empty worked fine in this example with no tweaking other than column names.

writeLines("STN      LON(east)   LAT(north)     ALT(m)  NAME        
391:         6.197       51.498      19.50  ARCEN                   
370:         5.377       51.451      22.60  EINDHOVEN               
331:         4.193       51.480       0.00  THOLEN                  
315:         3.998       51.447       0.00  HANSWEERT               
324:         4.006       51.596       0.00  STAVENISSE              
375:         5.707       51.659      22.00  VOLKEL                  
380:         5.762       50.906     114.30  MAASTRICHT              
240:         4.790       52.318      -3.30  SCHIPHOL                
286:         7.150       53.196      -0.20  NIEUW BEERTA"           
, 'myfile.txt'                                                      
)                                                                   
                                                                    
library(readr)                                                      
                                                                    
readr::read_fwf('myfile.txt', col_positions = fwf_empty('myfile.txt'
, col_names=c('STN','LON','LAT','ALT','NAME'))                      
, skip=1                                                            
)                                                                   
#> Parsed with column specification:
#> cols(
#>   STN = col_character(),
#>   LON = col_double(),
#>   LAT = col_double(),
#>   ALT = col_double(),
#>   NAME = col_character()
#> )
#> # A tibble: 9 x 5
#>     STN   LON    LAT   ALT         NAME
#>   <chr> <dbl>  <dbl> <dbl>        <chr>
#> 1  391: 6.197 51.498  19.5        ARCEN
#> 2  370: 5.377 51.451  22.6    EINDHOVEN
#> 3  331: 4.193 51.480   0.0       THOLEN
#> 4  315: 3.998 51.447   0.0    HANSWEERT
#> 5  324: 4.006 51.596   0.0   STAVENISSE
#> 6  375: 5.707 51.659  22.0       VOLKEL
#> 7  380: 5.762 50.906 114.3   MAASTRICHT
#> 8  240: 4.790 52.318  -3.3     SCHIPHOL
#> 9  286: 7.150 53.196  -0.2 NIEUW BEERTA

EDIT: Apologies @pgensler, as I did not realize you were using readr as well. Also, I had trouble reading from the link directly for some reason, so I just grabbed the snippet you included above.


#7

@cole no worries, only problem is I can’t upload a txt file to RStudio community, and I don’t think that code snippet accurately described the data. I’m tempted to almost share the byte sequence so it’s accurate. try the below link to test it out.

@fritsander I’m assuming you just wanted this piece of the file, correct?


#8

I think you’re right. Is this a little bit closer? Using my same writeLines hack, but hopefully a little more accurate this time. (Note: body of a message is apparently limited to 32k characters, so I truncated some stuff)

writeLines(                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
"# BRON: KONINKLIJK NEDERLANDS METEOROLOGISCH INSTITUUT (KNMI)                                                                                                                                                                                                                                                                                                                                                                                                                                                            
# Opmerking: door stationsverplaatsingen ...
#                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
#                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
# STN      LON(east)   LAT(north)     ALT(m)  NAME                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
# 391:         6.197       51.498      19.50  ARCEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
# 370:         5.377       51.451      22.60  EINDHOVEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
# 331:         4.193       51.480       0.00  THOLEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
# 315:         3.998       51.447       0.00  HANSWEERT                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
# 324:         4.006       51.596       0.00  STAVENISSE                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
# 375:         5.707       51.659      22.00  VOLKEL                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
# 380:         5.762       50.906     114.30  MAASTRICHT                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
# 240:         4.790       52.318      -3.30  SCHIPHOL                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
# 286:         7.150       53.196      -0.20  NIEUW BEERTA                                                                                                                                                                                                                                                                                                                                                                                                                                                                
# 310:         3.596       51.442       8.00  VLISSINGEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
# 283:         6.657       52.069      29.10  HUPSEL                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
# 280:         6.585       53.125       5.20  EELDE                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
# 273:         5.888       52.703      -3.30  MARKNESSE                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
# 323:         3.884       51.527       1.40  WILHELMINADORP                                                                                                                                                                                                                                                                                                                                                                                                                                                              
# 249:         4.979       52.644      -2.40  BERKHOUT                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
# 377:         5.763       51.198      30.00  ELL                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
# 316:         3.694       51.657       0.00  SCHAAR                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
# 313:         3.242       51.505       0.00  VLAKTE V.D. RAAN                                                                                                                                                                                                                                                                                                                                                                                                                                                            
# 277:         6.200       53.413       2.90  LAUWERSOOG                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
# 348:         4.926       51.970      -0.70  CABAUW                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
# 308:         3.379       51.381       0.00  CADZAND                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
# 319:         3.861       51.226       1.70  WESTDORPE                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
# 215:         4.437       52.141      -1.10  VOORSCHOTEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
# 278:         6.259       52.435       3.60  HEINO                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
# 285:         6.399       53.575       0.00  HUIBERTGAT                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
# 343:         4.313       51.893       3.50  R'DAM-GEULHAVEN                                                                                                                                                                                                                                                                                                                                                                                                                                                             
# 225:         4.555       52.463       4.40  IJMUIDEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
# 330:         4.122       51.992      11.90  HOEK VAN HOLLAND                                                                                                                                                                                                                                                                                                                                                                                                                                                            
# 267:         5.384       52.898      -1.30  STAVOREN                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
# 269:         5.520       52.458      -3.70  LELYSTAD                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
# 344:         4.447       51.962      -4.30  ROTTERDAM                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
# 275:         5.873       52.056      48.20  DEELEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
# 235:         4.781       52.928       1.20  DE KOOY                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
# 257:         4.603       52.506       8.50  WIJK AAN ZEE                                                                                                                                                                                                                                                                                                                                                                                                                                                                
# 290:         6.891       52.274      34.80  TWENTHE                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
# 350:         4.936       51.566      14.90  GILZE-RIJEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
# 251:         5.346       53.392       0.70  HOORN (TERSCHELLING)                                                                                                                                                                                                                                                                                                                                                                                                                                                        
# 210:         4.430       52.171      -0.20  VALKENBURG                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
# 248:         5.174       52.634       0.80  WIJDENES                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
# 279:         6.574       52.750      15.80  HOOGEVEEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
# 258:         5.401       52.649       7.30  HOUTRIBDIJK                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
# 356:         5.146       51.859       0.70  HERWIJNEN                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
# 209:         4.518       52.465       0.00  IJMOND                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
# 265:         5.274       52.130      13.90  SOESTERBERG"                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
,'myfile.txt')                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
library(readr)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
readr::read_fwf('myfile.txt', col_positions = fwf_empty('myfile.txt'                                                                                                                                                                                                                                                                                                                                                                                                                                                      
, skip = 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
, col_names=c('#','STN','LON','LAT','ALT','NAME'))                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
, skip=5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
#> Parsed with column specification:
#> cols(
#>   `#` = col_character(),
#>   STN = col_character(),
#>   LON = col_double(),
#>   LAT = col_double(),
#>   ALT = col_double(),
#>   NAME = col_character()
#> )
#> # A tibble: 50 x 6
#>      `#`   STN   LON    LAT   ALT         NAME
#>    <chr> <chr> <dbl>  <dbl> <dbl>        <chr>
#>  1     #  391: 6.197 51.498  19.5        ARCEN
#>  2     #  370: 5.377 51.451  22.6    EINDHOVEN
#>  3     #  331: 4.193 51.480   0.0       THOLEN
#>  4     #  315: 3.998 51.447   0.0    HANSWEERT
#>  5     #  324: 4.006 51.596   0.0   STAVENISSE
#>  6     #  375: 5.707 51.659  22.0       VOLKEL
#>  7     #  380: 5.762 50.906 114.3   MAASTRICHT
#>  8     #  240: 4.790 52.318  -3.3     SCHIPHOL
#>  9     #  286: 7.150 53.196  -0.2 NIEUW BEERTA
#> 10     #  310: 3.596 51.442   8.0   VLISSINGEN
#> # ... with 40 more rows

#9

This is some wacky data, I tried to run it on my computer, and it still fails…it’s like it won’t skip the first 5 line, which I think is what make its fail:

library(readr)

d <- readr::read_fwf('~/Downloads/KNMI_20171127_hourly.txt',
  col_positions = fwf_empty('~/Downloads/KNMI_20171127_hourly.txt',skip = 5,
  col_names = c('#', 'STN', 'LON', 'LAT', 'ALT', 'NAME')),
  skip = 5
)   
#> Parsed with column specification:
#> cols(
#>   `#` = col_character(),
#>   STN = col_character(),
#>   LON = col_character(),
#>   LAT = col_character(),
#>   ALT = col_character(),
#>   NAME = col_character()
#> )
#> Warning in rbind(names(probs), probs_f): number of columns of result is not
#> a multiple of vector length (arg 1)
#> Warning: 126011 parsing failures.
#> row # A tibble: 5 x 5 col     row   col  expected    actual                                   file expected   <int> <chr>     <chr>     <chr>                                  <chr> actual 1     1   STN 272 chars        49 '~/Downloads/KNMI_20171127_hourly.txt' file 2     1  <NA> 6 columns 2 columns '~/Downloads/KNMI_20171127_hourly.txt' row 3     2   STN 272 chars        53 '~/Downloads/KNMI_20171127_hourly.txt' col 4     2  <NA> 6 columns 2 columns '~/Downloads/KNMI_20171127_hourly.txt' expected 5     3   STN 272 chars        50 '~/Downloads/KNMI_20171127_hourly.txt'
#> ... ................. ... ........................................................................ ........ ........................................................................ ...... ........................................................................ .... ........................................................................ ... ........................................................................ ... ........................................................................ ........ ........................................................................
#> See problems(...) for more details.
glimpse(d)
#> Error in glimpse(d): could not find function "glimpse"

#10

Yes. This is wild :smile: I think I found out my problem, though - the server is only allowing ~100kb/second (~50 megabit connection on my end). That explains why the read looks like it basically hangs. I downloaded the raw file, and eventually saw the problem… line 57 starts what is basically an entirely different file.

So stopping there gives us what we want:

library(readr)                                     
filepath <- '~/Downloads/KNMI_20171127_hourly.txt' 
                                                   
file_spec <- fwf_empty(filepath                    
, skip = 5                                         
, col_names=c('#','STN','LON','LAT','ALT','NAME')  
, n = 10                                           
)                                                  
readr::read_fwf(filepath, col_positions = file_spec
, skip=5                                           
, n_max=55-5                                       
)                                                  
#> Parsed with column specification:
#> cols(
#>   `#` = col_character(),
#>   STN = col_character(),
#>   LON = col_double(),
#>   LAT = col_double(),
#>   ALT = col_double(),
#>   NAME = col_character()
#> )
#> # A tibble: 50 x 6
#>      `#`   STN   LON    LAT   ALT         NAME
#>    <chr> <chr> <dbl>  <dbl> <dbl>        <chr>
#>  1     #  391: 6.197 51.498  19.5        ARCEN
#>  2     #  370: 5.377 51.451  22.6    EINDHOVEN
#>  3     #  331: 4.193 51.480   0.0       THOLEN
#>  4     #  315: 3.998 51.447   0.0    HANSWEERT
#>  5     #  324: 4.006 51.596   0.0   STAVENISSE
#>  6     #  375: 5.707 51.659  22.0       VOLKEL
#>  7     #  380: 5.762 50.906 114.3   MAASTRICHT
#>  8     #  240: 4.790 52.318  -3.3     SCHIPHOL
#>  9     #  286: 7.150 53.196  -0.2 NIEUW BEERTA
#> 10     #  310: 3.596 51.442   8.0   VLISSINGEN
#> # ... with 40 more rows

#11

Hi Guys,

Great help. What I have done was downloading the entire file containing day weather data from about fifty weather stations from The Royal Netherlands Meteorological Institute (http://projects.knmi.nl/klimatologie/daggegevens/getdata_dag.cgi). This file contains the daily data of fifty weather stations after line 60 or something. The first part of the file contains the identification no., lon and lat, altitude and name of the stations. This part of the file was subject of my questions because I struggled to read this part correctly. Later, I will try to use your suggestions to read this part of the file into a separate tibble or dataframe.

Thanks a lot,

Fritsander


#12

That makes a lot more sense! Note that you can automate the download itself using the curl or the httr package (so long as you can generate / find the URL… maybe rvest is another package worth mentioning for scraping / searching through HTML to find URLs). Then, of course, you can find where exactly the transition happens using something like the following (note it will probably need a combination of the approaches I used):

filepath <- '~/Downloads/KNMI_20171127_hourly.txt'
raw <- readLines(filepath)                        
                                                  
raw <- raw[1:120]                                 
                                                  
stringr::str_length(raw)                          
#>   [1]  61 506   2   2  50  51  55  52  55  56  52  56  54  58  56  52  51
#>  [18]  55  60  54  49  52  62  56  52  53  55  57  51  56  61  54  62  54
#>  [35]  54  55  52  53  58  53  57  66  56  54  55  57  55  52  57  59  57
#>  [52]  53  56  56  54   2  48  95 261 148  99  71  89  95  98 114  53  59
#>  [69]  68  83 210 122  90 199 462 107 108 109 109 113   2 152   2 152 152
#>  [86] 152 152 152 152 152 152 152 152 152 152 152 152 152 152 152 152 152
#> [103] 152 152 152 152 152 152 152 152 152 152 152 152 152 152 152 152 152
#> [120] 152
stringr::str_detect(raw, '^\\s*\\#')          
#>   [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#>  [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#>  [23]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#>  [34]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#>  [45]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#>  [56]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#>  [67]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
#>  [78]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
#>  [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [100] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> [111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

#13

Thanks @fritsander for the clarification. Looking back on your question, I see what you were intending to accomplish, but it would have been helpful to have a link to the raw file so we could see where that piece of data ends (which is where you wanted to start reading in the data). I’d actually be curious to see if you could use something like https://data.world/, as it’s main purpose is to host publicly available data, such as files like this, and it’s free. I’ve been pretty impressed with their offering from what I’ve used so far. Does that make sense?

I’d also strongly encourage you to use a text editor to view the raw file for errors (Visual Studio Code is amazing for this, also free: https://code.visualstudio.com/ )

Let me know if the below code is what you were expecting,
@cole idk why, but I keep seem to get this warning error(see below for full output):

Warning message:
In rbind(names(probs), probs_f) :
  number of columns of result is not a multiple of vector length (arg 2)

I think you could also use readr::read_csv as well. Let me know if this helps.

path <- "~/Downloads/KNMI_20171127_hourly.txt"
library(tidyverse)
#> ── Attaching packages ──────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
#> ✔ tibble  1.3.4     ✔ dplyr   0.7.4
#> ✔ tidyr   0.7.2     ✔ stringr 1.2.0
#> ✔ readr   1.1.1     ✔ forcats 0.2.0
#> ── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()

data <- readr::read_delim(path, delim = ',', skip = 81)
#> Parsed with column specification:
#> cols(
#>   .default = col_character(),
#>   YYYYMMDD = col_integer()
#> )
#> See spec(...) for full column specifications.
#> Warning in rbind(names(probs), probs_f): number of columns of result is not
#> a multiple of vector length (arg 2)
#> Warning: 1 parsing failure.
#> row # A tibble: 1 x 5 col     row   col   expected    actual                                   file expected   <int> <chr>      <chr>     <chr>                                  <chr> actual 1     1  <NA> 25 columns 1 columns '~/Downloads/KNMI_20171127_hourly.txt' file # A tibble: 1 x 5
glimpse(data)
#> Observations: 62,929
#> Variables: 25
#> $ `# STN`  <chr> "# ", "  391", "  391", "  391", "  391", "  391", " ...
#> $ YYYYMMDD <int> NA, 20171001, 20171001, 20171001, 20171001, 20171001,...
#> $ `   HH`  <chr> NA, "    1", "    2", "    3", "    4", "    5", "   ...
#> $ `   DD`  <chr> NA, "  170", "  170", "  170", "  190", "  150", "  1...
#> $ `   FH`  <chr> NA, "   20", "   20", "   20", "   20", "   20", "   ...
#> $ `   FF`  <chr> NA, "   20", "   20", "   20", "   10", "   20", "   ...
#> $ `   FX`  <chr> NA, "   30", "   30", "   40", "   30", "   40", "   ...
#> $ `    T`  <chr> NA, "   96", "   96", "   95", "   97", "   98", "  1...
#> $ `  T10`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
#> $ `   TD`  <chr> NA, "   93", "   88", "   87", "   92", "   90", "   ...
#> $ `   SQ`  <chr> NA, "    0", "    0", "    0", "    0", "    0", "   ...
#> $ `    Q`  <chr> NA, "    0", "    0", "    0", "    0", "    0", "   ...
#> $ `   DR`  <chr> NA, "    0", "    0", "    0", "    0", "    0", "   ...
#> $ `   RH`  <chr> NA, "    0", "    0", "    0", "    0", "    0", "   ...
#> $ `    P`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
#> $ `   VV`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
#> $ `    N`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
#> $ `    U`  <chr> NA, "   98", "   95", "   94", "   96", "   94", "   ...
#> $ `   WW`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
#> $ `   IX`  <chr> NA, "    6", "    6", "    6", "    6", "    6", "   ...
#> $ `    M`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
#> $ `    R`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
#> $ `    S`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
#> $ `    O`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
#> $ `    Y`  <chr> NA, "     ", "     ", "     ", "     ", "     ", "   ...
problems(data)
#> # A tibble: 1 x 5
#>     row   col   expected    actual                                   file
#>   <int> <chr>      <chr>     <chr>                                  <chr>
#> 1     1  <NA> 25 columns 1 columns '~/Downloads/KNMI_20171127_hourly.txt'

#14

It looks like the file has a header row (line 82) and then a row with just “#” (line 83). That line looks to be the one throwing the warning. The warning just means that where the parser expected 25 columns, it found only one. It indicates that row 1 was the problem (after parsing the header in line 82), so that is where to look when debugging. The read_delim / read_csv seems to do a great job with that second batch of data!