I've been teaching some of the material from Advanced R this week, and was realizing I can talk about when you might want to use s4 rather than s3, but I have no idea (or good examples) about when you would want to use reference classes. Thoughts?
In a presentation about OO I skipped over R original RC (https://numeract.github.io/dallas-roo/#50) and I talked only about R6 (which is of the "reference" type).
One package that uses R's original reference classes is openxlsx
(https://github.com/awalker89/openxlsx/blob/master/R/class_definitions.R). The idea is that you need to keep a pointer to the original object (i.e., the workbook tree) and allow the methods to modify its own data instead of returning a copy of the original object.
R6 does this much better in my opinion. I use R6 for caching in rflow
(e.g., https://github.com/numeract/rflow/blob/master/R/eddy-r6.R). In this case, an R6Eddy
instance stores the caching data for all cached functions; R6 simplifies the manipulation of the structure and allows keeping only one representation of the cache store throughout the R session without the need to sync several such instances.
In modeling contexts, one use case is avoiding a copy of a large dataset. For example, the GauPro package uses R6 classes to allow for stateful updates the posterior. This prevents redundant computation when new data becomes available. I believe the package is intended as a backend for bayesian optimization.
A number of r-lib packages use R6.
processx
progress
They can also be useful to expose C++ classes to R, then you can create a more user friendly functional layer on top of that. See R arrrow for that general idea.
Thanks to @MikeBadescu, @alexpghayes, and @davis for these answers! My high-level takeaway is that R6 is useful when you are manipulating very large datasets, to avoid the copy-on-modify that R usually does. Is there more to it than that?
I would add the case where the object has a "state". One could use the following construct:
obj <- list(state = 0, ....) # all RC objects can be seen as lists (simplification)
f1 <- function(obj_, ...) {
main_result = .... # calculation
obj_$state = 2
list(res=main_result, obj=obj_) # need to return the the modified obj
}
lst <- f1(obj, ...) # no side effect but messy
obj <- lst$obj # doing this many times is not fun
res <- lst$res
RC/R6 make things easier to work with:
obj <- R6(state = 0, ....) # not a proper R6 definition, just illustrating the concept
f1 <- function(obj_, ...) {
main_result = ....
obj_$state = 2 # being R6 this modifies `obj` defined in the global env ==> side effect!
main_result
}
res <- f1(obj, ...) # nicer to work with, obj already updated (but be careful about side effects)