Hi,
I'm apologize if this is not the right place for this kind of questions, and I would be happy to post it somewhere else if you think it'd be more appropriate.
I sometimes need to read large .csv files, containing data on various items categorized by a serial number (an integer). Since I want to make nice plots with ggplots
, I'd like the serial numbers to be read as factors, so that I can color different items differently (for example). However, this isn't possible with read_csv()
: col_types = col( Serial = col_factor())
requires knowing the levels (i.e., all the serial numbers) before reading. I'm then stuck with either one of these options:
- read the data with
read_csv()
and then convert columns withmutate_at()
(doable, but a bit of a waste) - read data with
read.csv()
(slooow!) - read data with
data.table::fread
(works, but I like much more thetidyverse
API)
Why isn't possible to read the columns as factors and let read_csv()
compute the levels? I guess this comes from @hadley's aversion to stringsAsFactors = TRUE
(which I can totally relate to). However, in my use case I'm explicitly specifying which columns to read as factors. Am I missing something obvious?
PS reproducible example:
write_csv(data.frame(x=1:19, y=letters[1:19]), path = "foo.csv")
bar <- read_csv("foo.csv", col_types = cols(x = col_factor()), col_names = T)
Error in structure(list(...), class = c(paste0("collector_", type), "collector")) :
argument "levels" is missing, with no default