Reading a csv file with metadata after column names

jtr13 · December 30, 2020, 3:29pm

I have .csv files in which the first row contains column names, the 2nd and 3rd rows are character metadata, and the actual data starts on the 4th line like this:

readr::read_csv(
  'Student, Test1, Test2
   , Manual Posting, Manual Posting
   Points Possible, 100, 100
   "Doe, Jane", 85, 90')
#> # A tibble: 3 x 3
#>   Student         Test1          Test2         
#>   <chr>           <chr>          <chr>         
#> 1 <NA>            Manual Posting Manual Posting
#> 2 Points Possible 100            100           
#> 3 Doe, Jane       85             90

^{Created on 2020-12-30 by the reprex package (v0.3.0)}

Since the column names are in the first row of the file, I can't skip them and therefore read the whole file, specify that the 2nd and 3rd columns are numeric and then remove the two lines of character data, like this:

library(magrittr)
readr::read_csv(
'Student, Test1, Test2
 , Manual Posting, Manual Posting
 Points Possible, 100, 100
 "Doe, Jane", 85, 90', col_types = "cnn") %>% 
dplyr::slice(-c(1:2))
#> Warning: 2 parsing failures.
#> row   col expected         actual         file
#>   1 Test1 a number Manual Posting literal data
#>   1 Test2 a number Manual Posting literal data
#> # A tibble: 1 x 3
#>   Student   Test1 Test2
#>   <chr>     <dbl> <dbl>
#> 1 Doe, Jane    85    90

^{Created on 2020-12-30 by the reprex package (v0.3.0)}

Is there a better way?

StatSteph · December 30, 2020, 3:42pm

I usually do the following in these cases. 1. Read in the first line to get what I want to be the names. 2. Read in starting whether the data is so the type is correct.


blob <-
  'Student, Test1, Test2
   , Manual Posting, Manual Posting
   Points Possible, 100, 100
   "Doe, Jane", 85, 90'  

names_df <- readr::read_csv(blob, n_max =0)
names_df
#> # A tibble: 0 x 3
#> # ... with 3 variables: Student <chr>, Test1 <chr>, Test2 <chr>

dat <- readr::read_csv(blob, skip=3, col_names = names(names_df))
dat
#> # A tibble: 1 x 3
#>   Student   Test1 Test2
#>   <chr>     <dbl> <dbl>
#> 1 Doe, Jane    85    90

^{Created on 2020-12-30 by the reprex package (v0.3.0)}

jtr13 · December 30, 2020, 3:45pm

Thanks, I like that. Very clean and no parsing failures.

system · January 20, 2021, 3:45pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.