Removing the descriptions from csv-data with varying number of rows with a function

karolina189 · March 5, 2022, 9:25pm

Hello everyone! I'm working with various csv datasets. Before the data, every set contains additional descriptions at the beginning, in which the number of rows differ. I'm relatively new to R-Studios, so I wanted to ask for advice if somebody could help me to find a function for this issue.

Thank you!

FJCC · March 5, 2022, 10:12pm

I am not aware of a function that detects where metadata ends in a file. Maybe there is one but I am not aware of it. Is there any consistent marker or characteristic in your data which could be used to recognize the end of the metadata or the beginning of the actual data? For example, are the column headings consistent?

karolina189 · March 6, 2022, 5:22pm

First of all, thank you for your reply! The only consistent marker is that before the actual data starts, the row before is empty - this is the case in every file.

In addition to that, only in the real data are more than five consecutive rows, while in the description there are only a maximum of four consecutive rows.

FJCC · March 6, 2022, 6:06pm

If the above means that there are at most 4 rows of metadata and the first blank line is the break between the meta data and the real data, you can use something like this:

library(tidyverse)
LINES <- read_lines("~/R/Play/Dummy.csv", n_max = 5)
FirstBlank <- which(LINES == "")[1]
DAT <- read_csv(file = "~/R/Play/Dummy.csv", skip = FirstBlank)

system · March 27, 2022, 6:06pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.