Reading .txt skipping header and footer

whydoge · August 2, 2020, 10:09am

hey there,
im new to R
Having collected Data from a spectrometer, i get .txt files containing like 1200 lines including header (19 lines) and footer (40 lines).
I presume, header and footer always have the same lenghts.
I know how to skipt first lines (eg 19 for the header), but i cant skip the last lines...
Is there a way to say to read data only from "line 20 to lastlines-40"
Or any other possibility to exlude the last 40 lines?
Thanks for helping!
Flo

Hlynur · August 3, 2020, 10:51am

Hi whydoge, there is currently no argument to skip footers in either base R or tidyverse as far as I'm aware of.

You have two options:

What you can do, if you know a) the number of lines of actual data, b) the number of lines in the header and c) the number of lines in the footer, is you can use the n_max argument in the read_delim function in readr. For the example you mention, you'd set skip = 19 and n_max = 1200-40-19 or just n_max = 1141.

suppressPackageStartupMessages(library(dplyr))
library(readr)
#Creating some data
tibble(col_1 = c(rep("Header", 19),
                 1:(1200-40-19),
                 rep("Footer", 40))) %>% 
  write_delim("mydata.txt")

read_delim("mydata.txt", 
           delim = ",",
           skip = 19,
           n_max = 1200-40-19)
#> Parsed with column specification:
#> cols(
#>   Header = col_double()
#> )
#> # A tibble: 1,141 x 1
#>    Header
#>     <dbl>
#>  1      1
#>  2      2
#>  3      3
#>  4      4
#>  5      5
#>  6      6
#>  7      7
#>  8      8
#>  9      9
#> 10     10
#> # … with 1,131 more rows

^{Created on 2020-08-03 by the reprex package (v0.3.0)}

Another option, as pointed out by Jim Hester here: https://github.com/tidyverse/readr/issues/88
is to first read the data in as lines, throwing away the last 40 lines, prepping those lines to be read in again and then using read_delim (or read_csv or what have you) to read in the data, skipping the 19 header rows.

read_lines("mydata.txt") %>% 
  head(-40) %>% 
  paste(collapse = "\n") %>% 
  read_delim(skip = 19,
             delim = ",")
#> # A tibble: 1,141 x 1
#>    Header
#>     <dbl>
#>  1      1
#>  2      2
#>  3      3
#>  4      4
#>  5      5
#>  6      6
#>  7      7
#>  8      8
#>  9      9
#> 10     10
#> # … with 1,131 more rows

Hope that is of some help

system · August 24, 2020, 10:51am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.