Confused about tidymodels parallel processing with necessary metaprogramming

In the above link, the last code snippet shows the following:

num_pcs <- 3

recipe(mpg ~ ., data = mtcars) %>% 
  # Bad since num_pcs might not be found by a worker process
  step_pca(all_predictors(), num_comp = num_pcs)

recipe(mpg ~ ., data = mtcars) %>% 
  # Good since the value is injected into the object
  step_pca(all_predictors(), num_comp = !!num_pcs)

In this case, does !! make the R's command invoked/recognised by all the parallel cores and their respective threads as an orchestrated task instead of creating 3 of their own PCAs per core (totalling 9 perhaps? If so, who knows how garbage collection is handled)? So ultimately, the output will be realised as intended.

Either way, how do we know when to use !!? Would such methods be necessary for vfold_cv, grid, or any other steps of the modelling process?

Hi @pathos,

The use of !! is a bit technical. !! is used to force immediate evaluation of an R expression. This is especially useful when you want part of an expression to be evaluated, even if the rest of the expression is deferred to be evaluated at a later time. For example, step_pca() is simply the definition of a recipe step, but we don't want all_predictors() to actually be evaluated until the recipe is actually trained (with prep()). On the other hand, we want to make sure the expression passed to the num_comp argument is evaluated right away so that even on a worker process the recipe step will know how many components we want (i.e. 3).

By using !! it ensures that the number 3 is injected in the expression, rather than possibly having the object num_pcs which refers to an object in the global environment that may not be available on a parallel worker.

Basically, you can use !! to force the immediate evaluation of some R code. This only works in specific contexts. More info can be found by reading various parts of the {rlang} documentation on tidy evaluation and metaprogramming (Functions for Base Types and Core R and Tidyverse Features • rlang).

1 Like

Here is a very stripped down example of how !! works. We can use rlang::expr() to capture an R expression without evaluating it, or, we can use !! to force its evaluation.

x <- 10

#> x

#> [1] 10
1 Like

Either way, how do we know when to use !! ? Would such methods be necessary for vfold_cv , grid, or any other steps of the modelling process?

To this point, you can use !! any time you are passing objects as arguments to tidymodels functions which delay execution in some way, and whereby the object lives in the global environment.

Resample functions like vfold_cv(), for example, do not delay their execution, so you would not need to force any execution.

1 Like

To add on to Matt's excellent answer, the functions in parsnip and recipes are the main functions that delay execution so using !! is a good idea there.

(psock) parallel processing, is very exacting in terms of where it looks for arguments. Multicore, not available for windows, is very forgiving.

I don't think that there is an obvious place to look for this type of advice. We do have advice in ?recipes::selections. Any suggestions about another (centralized) place to note these issues?

1 Like

@mattwarkentin Thank you very much for the answer, it definitely clarified all the uncertainties I had.

@Max I'm unsure about a centralised place, but making it searchable could be the first step. For example, ?!! in R would not evaluate to invoke a help page, for reasons that are understandable, but a documentation for various situations as examples would be good, I think (because I'm unsure how I would invoke such a help page in R Studio). Then the online search would eventually lead people to the documentation, then people would naturally diffuse the knowledge in their own forms through their own blogs, tutorials, videos, etc. I would wager that making it searchable would be the first step. Currently, when I search 'R metaprogramming' online, there is no vignette-like short help page that I can find for !!, to clarify an example situation, only textbooks or long texts.

1 Like

In general, the code for !! and !!! are in the rlang package (although this is not obvious).

On that pkgdown site, there is a "Metaprogramming" that has a lot of helpful articles. None of them deal with tidymodels or parallel processing though. I've made an issue to add some the implications of using !!! with tidymodels to our book before it goes into print.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.