Calling C or C++ for R

,

I have been dipping my toe in "Advanced R". Chapter 25 covers Rcpp which states that some R code can be made faster if coded in C++, but when I go to purrr's github I see that 13% of the code is in C while 0.1% is in C++.

What is the relationship between C and C++ as it relates to R?

How are decisions made about which language to call?

Does C have an analogous package to Rcpp?

4 Likes

Looks like purrr is entirely in C. I wouldn't be the one to ask why that is.

On the more general questions.

R has a direct ability to link against other languages such as C and Fortran. However to do that correctly you have to be an expert in R's calling interface and how to protect/de-protect objects. To my knowledge there is no C equivalent of Rcpp.

Rcpp uses the fact that C++ has objects and controlled object lifetimes to manage the R to C++ interface (calling protect/un-protect to deal with interfacing with the garbage collector). To use Rcpp you just need to know a bit of R and C++ and read through a demo. You don't need expert level knowledge to write performant and safe code.

In conclusion I would say for most coders the C++ path is far easier and safer.

4 Likes

Most people will find C++ (via Rcpp) easier to write. However, Rcpp comes with some disadvantages, primarily compilation speed — Rcpp code takes much longer to compile than the equivalent C code, making iteration during development slower. Usually the greater feature set of C++ dominates this consideration, but the problems in purrr tend to be quite direct so the richer features of C++ tend to have little room to shine. purrr also gives me and @lionel a place to practice our C code, which is important because it helps us understand what is going on at a very low level, and improves our ability to judge when the cost of using Rcpp is worth it.

6 Likes

For very low level packages Rcpp can be tricky. It generally preserves SEXPs rather than protects them, which can have subtile performance implications that vary across platforms (unpreservation is linear and order of preservation not guaranteed by C++ semantics in some corner cases). C++ and R's jumpy semantics don't mix well and can lead to memory leaks and undefined behaviour. The fragile workarounds in Rcpp for this problem make evaluation of R code (which is the core business of purrr, being concerned with maps) very costly. purrr used to be written in C++ but was rewritten in C for performance. I have put a lot of work to fix this in Rcpp, but the fix is not yet enabled by default and is only compatible with R 3.5 thanks to a new primitive that Luke added to fix the problem.

More generally, while Rcpp is easy (and absolutely recommended for most uses), C++ is a vastly complex language compared to C. We spent many days fixing corner case bugs and crashes in dplyr that were due to subtle interactions of the complex semantics in the language. When you add Rcpp in the mix, the complexity is even higher because you need understand how Rcpp is implemented in addition to the intricacies of C++. Finally, note that R core members themselves don't recommend using C++ for writing R code.

6 Likes

About protection, it is hard to get it right in C. But the task has been made much simpler thanks to rchk, a static analyser written by Tomas Kalibera. One downside of C++ is that it can't be reliably analysed by rchk, for example it doesn't support references. This might make your package more brittle to garbage collection. For instance there's currently a bug in the Rcpp constructor for vectors. It doesn't protect its argument yet it allocates and might trigger a garbage collection. That means that this simple code exposes the fresh object to a collection:

CharacterVector myfun() {
  return Rf_allocVector(STRSXP, 10);
}

This particular case was uncovered by rchk and has just been fixed in Rcpp by Romain if I'm not mistaken. But the point is that protection is tricky even with C++, and using C++ will prevent rchk from working to its full capacity. That's another reason for using C to implement low level packages like rlang, purrr, and vctrs.

5 Likes

Obviously you are an expert here. But a couple of points.

Rcpp is an excellent package, I doubt core-R members dis-recommend it. Not quote what was said, but I point I think needs to be clarified and emphasized for others.

Any package can have a bug, and a fixed bug is not a design choice. So obviously Rcpp intends to handle these cases correctly, and it sounds like it now does handle these cases correctly. Find the error and submitting an issue, test, or fix is the thing to do that helps the community. Sounds like that is what happened.

My guess is most R users want to use Rcpp to manipulate numeric structures, not R-language elements- so Rcpp is very optimized and tested for these cases. So what choices you feel are optimal for a purrr or an rlang may or may not be the same for other users on more common tasks. However, the original question was obviously about purrr- so it is nice to see the two primary authors comment.

1 Like

My guess is most R users want to use Rcpp to manipulate numeric structures, not R -language elements- so Rcpp is very optimized and tested for these cases. So what choices you feel are optimal for a purrr or an rlang may or may not be the same for other users on more common tasks.

I said: "Rcpp is absolutely recommended for most cases".

I doubt core- R members dis-recommend it.

They (at least Luke and Tomas) dis-recommend using C++.

2 Likes

Ah sorry if I confused opinions about Rcpp and opinions about C++. Thanks for the clarification.

1 Like

Thanks for the detailed answer! As someone who does not know anything about the R to C interface it was cool to hear some of the benefits and difficulties there.

I'm still a little confused about the recommendation against using C++ for writing R code - I think I'm missing the context or situation implied. Does this mean if one is working with R objects and wants to use C++, they should use Rcpp and not try to work directly with those structures? Or are the opinions of those against C++ just happen to be contrary with those who think using Rcpp is a good thing?

I think it's mostly about C++ RAII mechanisms not mixing well with the C-style longjumps used in the R API. But this will get better in the future.

I would assume the possibility of encountering longjump issues when writing C/C++ code that calls back R code. Which to me means code that is calculating over mathematical objects (matrices, vectors) purely in C/C++ (i.e. code that is implementing a numeric algorithm taking data from R and then returning a result to R) shouldn't be at great risk to such.

I have less "C/C++ from R" experience than you do. But for what it is worth I would say it is in fact a good practice start with Rcpp and a vanilla (C-like) sub-dialect of C++ until one runs into compelling reason to switch to direct C interfaces.

I would assume the possibility of encountering longjump issues when writing C / C++ code that calls back R code

Or any calls to the C API of R. There are places where Rcpp calls that API without protection because doing so would hurt performance. This will get better once unwind-protect is enabled by default (we already enabled it in dplyr).

Numerical computations are safe regarding longjumps. But Rcpp makes it easy to bypass the value semantics of R, something which is frowned upon by the R core developers. And unsafe with ALTREP objects.

But for what it is worth I would say it is in fact a good practice start with Rcpp

I have never suggested to start with C, quite the opposite.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.

Tomas Kalibera just posted about the issues of using C++ with R: https://developer.r-project.org/Blog/public/2019/03/28/use-of-c---in-packages/

1 Like