large max time for microbenchmark of rcpp call

LaurentPlagne · April 16, 2021, 4:50pm

Hi,
I wonder about the time distribution obtained with microbenchmark and rcpp packages.
The R code is here:

R script

library(Rcpp)
library(microbenchmark)
library(ggplot2)

sourceCpp('test.cpp')

nrows <- 200
ncols <- 200
l <- nrows*ncols
#data <- runif(l)
r <- matrix(runif(l), nrow = nrows, ncol = ncols) # random numeric matrix
a <- matrix(numeric(l), nrow = nrows, ncol = ncols) # zero numeric matrix
b <- matrix(numeric(l), nrow = nrows, ncol = ncols) # zero numeric matrix
c <- matrix(as.numeric(1:l), nrow = nrows, ncol = ncols) # incremented numerix matrix (col major)

myrfunc <- function(a_in) # pas arg by value (sigh)
{  
  ncols=ncol(a)
  nrows=nrow(a)
  for (j in 1:ncols)
  {
    for (i in 1:nrows)
    {
      a_in[i,j] <- a_in[i,j]+i+100.0*j
    }
  }
  return(a_in)
}

allm <- microbenchmark("r function" = {a<-myrfunc(a)},
                       "C++ matrix_update " =  {matrix_update(b)},
                       "C++ matrix_update_bis " =  {matrix_update_bis(b)})


autoplot(allm)
ggsave("test_perf.png")

test.cpp

#include <Rcpp.h>
using namespace Rcpp;


inline size_t getindex(size_t i,size_t j,size_t nr,size_t nc) {
  return i+nr*j;
} 

// [[Rcpp::export]]
void matrix_update(NumericMatrix  a) {
  //std::cout<< "nrows="<<a.nrow()<<std::endl;
  //std::cout<< "ncols="<<a.ncol()<<std::endl;
  const size_t nr=a.nrow();
  const size_t nc=a.ncol();
  const size_t l=nr*nc;
  
  for (size_t j=0 ; j<nc ; j++){
    for (size_t i=0 ; i<nr ; i++){
      a[getindex(i,j,nr,nc)]+=double(i+1)+100.0*double(j+1);
    }
  }
}

// [[Rcpp::export]]
void matrix_update_bis(NumericMatrix  a) {
  //std::cout<< "nrows="<<a.nrow()<<std::endl;
  //std::cout<< "ncols="<<a.ncol()<<std::endl;
  const size_t nr=a.nrow();
  const size_t nc=a.ncol();
  for (size_t j=0 ; j<nc ; j++){
    for (size_t i=0 ; i<nr ; i++){
      a(i,j)+=double(i+1)+100.0*double(j+1);
    }
  }
}

which gives me the following output:

The minimal and mean C++ times are OK but the max value is very large. I am a complete R beginner and I do not understand what is happening.

Thank you for your help.
Laurent

LaurentPlagne · April 18, 2021, 7:52am

Thanks @eddelbuettel for the explanation: large max time for a Rcpp call (microbenchmark) · Issue #1157 · RcppCore/Rcpp · GitHub

That's standard R behaviour of, every now and then, requiring a call to garbage collection ( i.e. function gc() from R). It would be the same if you coded the same test function 'by hand' in C or C++ and interfaced it by hand---there is nothing nefarious here that Rcpp does and that we could simply remove. Most easy fixes have, in fact, been applied by now to a project that is well over ten years old.

nirgrahamuk · April 19, 2021, 12:03am

filter_gc

If `TRUE` remove iterations that contained at least 
one garbage collection before summarizing. If `TRUE`
 but an expression had a garbage collection in every 
iteration, filtering is disabled, with a warning.

When you benchmark these, do you get a warning that explains that gc filtering is disabled ?

LaurentPlagne · April 19, 2021, 7:19am

Thank you for the tip !
It took me a while figuring out that you were refering to another benchmark R package (bench):

new R script

library(Rcpp)
library(ggplot2)
library(bench)
library(beeswarm)

sourceCpp('test.cpp')

nrows <- 200
ncols <- 200
l <- nrows*ncols
#data <- runif(l)
r <- matrix(runif(l), nrow = nrows, ncol = ncols) # random numeric matrix
a <- matrix(numeric(l), nrow = nrows, ncol = ncols) # zero numeric matrix
b <- matrix(numeric(l), nrow = nrows, ncol = ncols) # zero numeric matrix
c <- matrix(as.numeric(1:l), nrow = nrows, ncol = ncols) # incremented numerix matrix (col major)

myrfunc <- function(a_in) # pas arg by value (sigh)
{  
  ncols=ncol(a)
  nrows=nrow(a)
  for (j in 1:ncols)
  {
    for (i in 1:nrows)
    {
      #cat("i=",i," j=",j, "a[",i,",",j,"]=",a_in[i,j],"\n")
      a_in[i,j] <- a_in[i,j]+i+100.0*j
      #cat("i=",i," j=",j, "a[",i,",",j,"]=",a_in[i,j],"\n")
    }
  }
  return(a_in)
}


mu=bench::mark(matrix_update(b),matrix_update_bis(b),myrfunc(a),filter_gc = TRUE, check = FALSE)
autoplot(mu)
ggsave("bench_perf.png")

It indeed allows for filtering out the gc overhead:

Thank you again !

nirgrahamuk · April 19, 2021, 7:31am

Ah, sorry for making a riddle of it!
I got confused because I saw another post earlier in the day where bench was used and confused that for your post. Anyway, glad if it helped

system · April 26, 2021, 7:32am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.