I'm playing around with a local database where I filter an index column for n indices. The table I'm querying is 7.5 million lines and 13 columns. The code looks something like this:
pacman::p_load(DBI, tidyverse) ### We need a list of indices to query the data base with. ### L is "database size" and n is number of indices L <- 1000000 n <- 326 indexList <- seq(1:L, n) ### We query the database/table by passing indexList to dplyr::filter() ### and then we load the results into memory in R with dplyr::collect() con <- dbConnect(RPostgres::Postgres(), dbname = "example_db") tbl(con, "example_table") %>% filter(index %in% indexList) %>% collect()
Collecting the data takes about five seconds which is fine except that when I set up a similar database on my personal laptop running the code takes only 0.05 seconds. I also tried rerunning the query with
dbGetQuery() but that wasn't any faster (or slower).
I'm guessing this has to do with the CPU type, type of hard drive, "closeness" of hardware, and other stuff. Is there any way for me to determine what the bottleneck is on the slower machine? I'm relatively new to benchmarking/databases so I would appreciate any input.