Is there a way to skip rows on file read in while keeping the header rows?


#1

I’m working with a Qualtrics Survey dataset and for some reason the first three rows of each survey are kind of a jumbled mess that are a mix of long hand survey questions and shorthand variable names. The first row of the dataset is the shorthand variable names while I want to get rid of the next two.

I want to get rid of these rows because they kind of poison the datatype of the column casting everything as chr. Is there a way I can keep the first row and remove the next two before the columns are cast as a specific data type? My code so far?


raw.file.path <- here::here("raw_data", "FDSCI 101 Attitude Survey vS14.csv")

# I've used drop = 37 here to get rid of a column I don't need, 
# curious if there is a way to get rid of rows while keeping the first row as headers.
raw.data <- import(raw.file.path, drop = 37, setclass = "tibble")

# This removes the rows but the columns are already poisoned.
raw.data %>%
  slice(3:n()) 

Any help would be greatly appreciated!


#2

I once briefly looked for a solution to a similar problem.

The only advice I could find was to read in the first line to get the column names, then separately read in from line 4 onwards (specifying no column names) for the data and then finally set the column names from what was read in by the first step.

I would also be interested in more elegant solutions.


#3

If you use the XLConnect package, there is a readWorkbook function which lets you specify which rows to read into a dataset. See the help page here:

https://www.rdocumentation.org/packages/openxlsx/versions/4.0.17/topics/readWorkbook


#4

Here is a snippet of code I wrote for working with qualtrics back near the start of the year

It doesn’t directly answer your issue but might be handy.

In similar situations for multiple headings, I have also done one read of one row to get the headings, then a second (skip) read of the data, then applied the headings to the data