When and why would you want to use RC as your OO system?

I've been teaching some of the material from Advanced R this week, and was realizing I can talk about when you might want to use s4 rather than s3, but I have no idea (or good examples) about when you would want to use reference classes. Thoughts?

7 Likes

In a presentation about OO I skipped over R original RC (https://numeract.github.io/dallas-roo/#50) and I talked only about R6 (which is of the "reference" type).

One package that uses R's original reference classes is openxlsx(https://github.com/awalker89/openxlsx/blob/master/R/class_definitions.R). The idea is that you need to keep a pointer to the original object (i.e., the workbook tree) and allow the methods to modify its own data instead of returning a copy of the original object.

R6 does this much better in my opinion. I use R6 for caching in rflow (e.g., https://github.com/numeract/rflow/blob/master/R/eddy-r6.R). In this case, an R6Eddy instance stores the caching data for all cached functions; R6 simplifies the manipulation of the structure and allows keeping only one representation of the cache store throughout the R session without the need to sync several such instances.

3 Likes

In modeling contexts, one use case is avoiding a copy of a large dataset. For example, the GauPro package uses R6 classes to allow for stateful updates the posterior. This prevents redundant computation when new data becomes available. I believe the package is intended as a backend for bayesian optimization.

2 Likes

A number of r-lib packages use R6.

processx

progress

They can also be useful to expose C++ classes to R, then you can create a more user friendly functional layer on top of that. See R arrrow for that general idea.

1 Like

Thanks to @MikeBadescu, @alexpghayes, and @davis for these answers! My high-level takeaway is that R6 is useful when you are manipulating very large datasets, to avoid the copy-on-modify that R usually does. Is there more to it than that?

1 Like

I would add the case where the object has a "state". One could use the following construct:

obj <- list(state = 0, ....)   # all RC objects can be seen as lists (simplification)

f1 <- function(obj_, ...) {
  main_result = ....   # calculation
  obj_$state = 2

  list(res=main_result, obj=obj_)    # need to return the the modified obj
}

lst <- f1(obj, ...)   # no side effect but messy
obj <- lst$obj        # doing this many times is not fun
res <- lst$res

RC/R6 make things easier to work with:

obj <- R6(state = 0, ....)   # not a proper R6 definition, just illustrating the concept

f1 <- function(obj_, ...) {
  main_result = ....
  obj_$state = 2    # being R6 this modifies `obj` defined in the global env ==> side effect!

  main_result
}

res <- f1(obj, ...)  # nicer to work with, obj already updated (but be careful about side effects)
1 Like