vroom() is much slower than read_tsv().

I have a 428MB tsv file with 44 columns and 1,306,197 rows. According to the benchmark, vroom() should read this file into R much faster than read_tsv(). However, my benchmark results consistently show that vroom() is actually much slower than read_tsv(). Below is the actual R code I used for benchmarking.

library(tidyverse)
library(vroom)
library(microbenchmark)

microbenchmark({test = vroom("path/to/tsv", delim = "\t", col_names = T, trim_ws = T)})
# mean 658.7998 median 629.4985

microbenchmark({test = read_tsv("path/to/tsv", col_names = T, trim_ws = T)})
# mean 10.13749 median 10.06836

What could be the reason?

what are the units in both cases ?
also, perhaps you could get the taxi data used in the benchmarks, write it out to tsv and then run benchmarks on that data. might be interesting.

1 Like

lol.. the unit was second for read_tsv() and millisecond for vroom(). Didn't notice the units could be different. I should have actually used it before benchmarking it. Obviously vroom() is much faster!

3 Likes

Glad that I asked. :laughing:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.