Hi,
I'm apologize if this is not the right place for this kind of questions, and I would be happy to post it somewhere else if you think it'd be more appropriate.
I sometimes need to read large .csv files, containing data on various items categorized by a serial number (an integer). Since I want to make nice plots with ggplots, I'd like the serial numbers to be read as factors, so that I can color different items differently (for example). However, this isn't possible with read_csv(): col_types = col( Serial = col_factor()) requires knowing the levels (i.e., all the serial numbers) before reading. I'm then stuck with either one of these options:
- read the data with
read_csv() and then convert columns with mutate_at() (doable, but a bit of a waste)
- read data with
read.csv() (slooow!)
- read data with
data.table::fread (works, but I like much more the tidyverse API)
Why isn't possible to read the columns as factors and let read_csv() compute the levels? I guess this comes from @hadley's aversion to stringsAsFactors = TRUE (which I can totally relate to). However, in my use case I'm explicitly specifying which columns to read as factors. Am I missing something obvious?
PS reproducible example:
write_csv(data.frame(x=1:19, y=letters[1:19]), path = "foo.csv")
bar <- read_csv("foo.csv", col_types = cols(x = col_factor()), col_names = T)
Error in structure(list(...), class = c(paste0("collector_", type), "collector")) :
argument "levels" is missing, with no default