Correctly Using Microbenchmark in R

omario · February 26, 2023, 4:53am

I am trying to learn how to use to Microbenchmark Functions within R.

As an example, I simulate a few random datasets of different sizes:

# load the lubridate package
library(lubridate)
library(microbenchmark)
library(forecast)

my_list = list()
index =  c(100, 1000, 10000, 50000, 100000, 250000, 500000, 750000, 1000000)



for (i in 1:length(index))
{

my_data_i = data.frame(dates = sample(seq(as.Date('2010/01/01'), as.Date('2023/01/01'), by="day"), replace = TRUE, index[i]), visits = 1)
my_list[[i]] = my_data_i

}

I then created a function that I want to repeatedly measure on each dataset:

my_function = function(){
# aggregate the data by week
my_data_i_weekly <- aggregate(my_data_i$visits, list(week = week(my_data_i$dates), year = year(my_data_i$dates)), sum)

# convert the data frame to a time series
my_data_i_ts <- ts(my_data_i_weekly$x, start = c(min(my_data_i_weekly$week), min(my_data_i_weekly$year)), frequency = 52)

# fit an ARIMA model using auto.arima
my_data_i_arima <- auto.arima(my_data_i_ts)

}

In the past, I would have manually timed each iteration - for example:

results = list()
for (i in length(index))
{
    start.time_i <- Sys.time()
    my_data_i = my_list[[i]]
    print(replicate(n = 100, my_function())
          end.time_i <- Sys.time()
          time.taken_i <- end.time_i - start.time_i
          results[[i]] = time_taken_i
}

Now, I am trying to learn how to do this using the "microbenchmark" function in R.

my_list2 = list()

for (i in 1:length(index))
{
my_data_i = my_list[[i]]
res_i = microbenchmark(my_function(), times = 100)
print(res_i)
my_list2[[i]] = res_i
}

To recap - I am trying to do the following:

Run "my_function()" on my_data[1] 100 times and record how long it took
Run "my_function()" on my_data[[2]] 100 times and record how long it took
etc.

Am I doing this correctly?

Thanks!

nirgrahamuk · February 26, 2023, 10:45pm

Note that depending on your choice of index values; and microbenchmark times; you could be waiting for a long time for results. I purposely chose numbers that would execute quickly on my machine; as I am not particularly invested in the results, and wanted the program to finish in a short time.

My general advice would be that if you find yourself writing functions that accept no parameters , and wanting differing output between runs of said function; you are likely making a mistake. I rewrote your function to take a dataframe, and pass that.
I show how multiple microbenchmarks summaries can be aggregated.
I personally used to use microbenchmark, but have more recently adopted library bench; as I like how it works better; in particular that it tracks garbage collection events.

library(lubridate)
library(microbenchmark)
library(forecast)
library(tidyverse)

index <- seq(from=100,to=175,by=25)

my_list <- list()
for (i in 1:length(index))
{
  my_list[[i]] <- data.frame(
    dates = sample(
      seq(as.Date("2010/01/01"),
        as.Date("2023/01/01"),
        by = "day"
      ),
      replace = TRUE, index[i]
    ),
    visits = 1
  )
}

my_function <- function(d_) {
  my_data_i_weekly <- aggregate(
    d_$visits,
    list(
      week = week(d_$dates),
      year = year(d_$dates)
    ), sum
  )

  my_data_i_ts <- ts(my_data_i_weekly$x,
    start = c(
      min(my_data_i_weekly$week),
      min(my_data_i_weekly$year)
    ),
    frequency = 52
  )
       auto.arima(my_data_i_ts)
}

(bench_res <- map_dfr(
  my_list,
  \(x){
    microbenchmark(my_function(x), 
                   times = 5L,
                   unit = "ms") |>
      summary() |>
      mutate(xparm = nrow(x)) |>
      relocate(xparm)
  }
))

omario · February 27, 2023, 7:48am

thank you so much for your answer!

Can you please re-explain why my original approach isnt optimal?
Can you please re-explain why the bench library is better than the microbench library?
If you have time, I would be interested in seeing how you use the "bench" library

Thank you so much!

system · March 24, 2023, 6:44am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.