Why might Rcpp be slow to return a large object to R?

I have a package that constructs large Rcpp::List objects and returns them to R. There's a long delay between just before the return x; step and the object being received in the R console. Has anyone else encountered this problem?

I can't reproduce the problem minimally, which is why I'm hoping that someone can remember having the same problem. I guess the object is being copied, but there might be something else going on too because the copy can take longer than constructing the original.

Here's a counter-example that doesn't have the delay. It creates a function f() in C++ that is available to run in R. The function constructs a large list, and returns it to R. Just before it returns, it prints "returning ..." to the console, so you can tell if there's a delay while the object is returned. On my machine, there is no delay, but the equivalent in my package has a big delay.

Rcpp::cppFunction(
  'Rcpp::List f() {
    int n = 50000000;
    std::vector<int> i;
    i.reserve(n);
    for (int j=1; j <= n ; ++j) {
      i.push_back(0);
    }
    Rcpp::List out = Rcpp::List::create(i, i, i, i, i, i, i, i, i, i,
                                        i, i, i, i, i, i, i, i, i, i);
    Rcpp::Rcout << "returning ...\\n";
    return out;
  }')
x <- f()

I have also tried a more complex example that is closer to what my package does. The large object is constructed in a class as a member, and the function f() returns that member. This still doesn't reproduce the delay. What else could I try?

myclass.h

#include <Rcpp.h>

class myclass {
  public:
    Rcpp::List big_;
    myclass() {
      int n = 50000000;
      std::vector<int> i;
      i.reserve(n);
      for (int j=1; j <= n ; ++j) {
        i.push_back(0);
      }
      big_ = Rcpp::List::create(i, i, i, i, i, i, i, i, i, i,
                                i, i, i, i, i, i, i, i, i, i);
    }
};

myscript.cpp

#include <Rcpp.h>
#include "myclass.h"

// [[Rcpp::export]]
Rcpp::List f() {
  myclass out;
  Rcpp::Rcout << "Returning ...\n";
  return out.big_;
}

R

Rcpp::sourceCpp("myscript.cpp")
x <- f()

I managed to reproduce the delay. I suspect it's caused by copying and garbage collection. The solution is not to wrap lots of Rcpp::List inside std::vector. Instead, wrap them inside another Rcpp::List.

This version has the delay.

Rcpp::cppFunction(
  'Rcpp::List f() {
    int n = 20000;
    std::vector<Rcpp::List> v;
    v.reserve(n);
    Rcpp::List l(n);
    for (int i = 0; i < n ; ++i) {
      v.push_back(Rcpp::List(1));
      l[i] = v[i];
    }
    Rcpp::Rcout << "returning ...\\n";
    return l;
  }')
x <- f()

This version does not have the delay.

Rcpp::cppFunction(
  'Rcpp::List g() {
    int n = 20000;
    Rcpp::List v(n);
    for (int i = 0; i < n ; ++i) {
      v[i] = Rcpp::List(1);
    }
    Rcpp::Rcout << "returning ...\\n";
    return v;
  }')
x <- g()
2 Likes