Postponing a costly computation with variables – are quosures the right tool?

maxheld83 · August 10, 2018, 11:38am

I recently learned about quosures and tidy evaluation, and it feels very magical .
But because of the whole "a-little-bit-of-knowledge-is-a-dangerous-thing"-situation, I want to make sure I'm using it right .

Requirements:

Let's say I have some vector x for which I would like to implement a fairly expensive function compute_a_lot().
Let's say further that compute_a_lot() should also work on subsets of x, say x[2:5], and that, crucially, when working on the subset, its result should depend on some info about all x (say, implausibly, length(x)).
Assume that I have already implemented a method for '[.x' to retain attributes on subsetting.

(My real use case is a print method for x, which needs to ensure all subsets of x are scaled equally).

Two naive solutions:

A naive solution of this problem would be to run compute_a_lot() whenever x is created, and to save the result as an attribute of x, which a method of compute_a_lot() could then just return when called on x[1:2]. But that would be very painful for the user, because compute_a_lot() would be run before it is actually needed, if ever.
Another naive solution would be to just to save the whole x as an attribute of x, which would be retained on subsetting and available for compute_a_lot() of x[1:2]. But that just seems disgusting.

So I thought, hey, I know a solution – quosures!

whenever x is created, I run something like

attr(x, "compute_a_lot_value") <- rlang::quo(compute_a_lot(x, ...))

Then, when compute_a_lot() is actually called on some x[1:3] and it's result needed , I run

rlang::eval_tidy(attr(x, "compute_a_lot"))

This seems to accomplish what I want: x always carries around with it the instructions (expression + environment) to calculate compute_a_lot(), but these instructions are acted upon only when it's really necessary.

I understand that quosures are mostly used in the context of NSE, so I'm a little worried I'm using it right.

Is this (delaying computation) a proper use of quosures?

Ps.: The whole thing happens in the context of x being an S3 class and print methods, but it seemed like that wasn't necessary to reprex here for the main issue.
Pps.: I am vaguely aware of / very excited about promises and future for long-running computations, but asynchronicity (?) is not the issue here, but just plain delaying the computation.

hadley · August 10, 2018, 12:19pm

Unless the computation to be performed is supplied by the user, a quosure is just complicating things. Instead use the classic way of defining computation separately from performing it: a function.

maxheld83 · August 10, 2018, 12:42pm

Ah, I see.

so, in lieu of the above, I just used purrr::partial().

On creation:

attr(x, "compute_a_lot") <- purrr::partial(...f = compute_a_lot, x = x)

And when the results are needed:

attr(x, "compute_a_lot")()

It works.
Did I get it right?

hoelk · August 10, 2018, 9:52pm

I am not sure what you want to achieve in detail, but It seems like your S3 class x should preserve the information about x that you want to retain. In your example I would do something like:

x <- 1:1e3
attr(x, "original_length") <- length(x)
class(x) <- "foo"

compute_a_lot.foo <- function(x, original_length = attr(x, "original_length")){
  ...
}

(obviously you would not construct x like that, but have a constructor function for it)

The whole idea about retaining attributes referring to the original object on subseting seems a bit wonky to me though (how to know what to preserve if you subset several times?)

I am also not sure what you are doing in your example, but if you store the environment of x that includes the whole vector x, I am not sure how that is better than just storing a copy of x in the first place.