Should tidyeval be abandoned?

I like it a lot; it's a lot more incremental and thorough than the original. The outstanding questions look like they'll be addressed by the headings at the bottom.

I think this whole topic is arising out of the fact that there is more than one persona for people using rlang, including those using it to

  • program with tidy eval,
  • operate on the language and
  • work within existing tidyverse functions (with !!, !!!, .data, etc.).

Finishing out the current vignette is probably sufficient to address confusion for the first user (though the vignette should absolutely be in rlang, not dplyr).

The second user likely just needs functions to be well-named and documentation pages to be thorough, so the index will lead to the necessary information. A vignette may be nice, but isn't necessary.

The third user includes a lot of tidyverse users, who will sooner or later need to refer to a variable from within a tidy eval context or otherwise encounter rlang unintentionally. This user needs to know less, and doesn't really want to learn the nitty-gritty of tidy eval to make mutate behave as desired. This user needs a pared-down vignette of how to manage scope in tidy eval so as to get the right data in the right place in mutate. This vignette belongs in dplyr.

I'm not :100: the third persona is quite right or even exists, but I suspect there's something there. Ultimately, though, even if the dplyr vignette is more of a cookbook of a couple common cases than a full framework, it can refer to the more complete rlang one for that context without worrying about itself being overkill, and would thus be useful for other personas regardless.

2 Likes

Thanks for the feedback, glad it's working out!

@alistaire I think maybe the first and third personas are the same group of people? We probably shouldn't expose quasiquotation to casual users because unquoting involves a mental gymnastic that is not entirely trivial to learn. Or did you mean the difference between (1) authors of quoting API and (3) authors of functions/wrappers?

About operating on the language, you can do it with either base functions or rlang functions. The plan is to export all rlang functions for working on the language in a tidyeval package. I'm not sure how much we should expose in the programming vignette, it currently only shows how to create symbols. It seems like it modifying calls would not be a typical task but there's probably a need for creating calls, e.g. the recent thread about negating a list of symbols. It is possible to teach how to create calls with quasiquotation: expr(-c(!!! syms)), so we tend to think that call creation should not feature prominently.

About the vignette, we'll probably make a second one that looks more like a cookbook. On top of that the next edition of adv-r that Hadley is currently working on will feature sections on meta-programming with quasiquotation and purrr.

3 Likes

@lionel, I have only had a chance to take a brief look at the draft, but it looks good.

I would consider myself as aspiring to be in @alistaire's third category, but without a compelling case to use these functions yet. Hopefully the forthcoming cookbook will enable me to re-assess how best to integrate the new functions.

@lionel, The new doc looks great. I think it's very clear.

The main question that came up for me was when to use quosures rather than bare expressions. Each of the examples so far works when I replace quo with expr. I see this is the next subject you are looking to document so I'll wait to learn the answer.

4 Likes

I have been spending a fair amount of time reading the draft chapters in the 2nd edition of the Advanced-R book, and I feel like I have gotten over the hump of understanding tidy eval. It blows my mind how useful tidy-eval is, and I have several blog posts swimming around in my head about how to do various things with it. Since now I feel like I understand it, I probably can't read your new vignette with a beginner's eye, but I have spent some time thinking about the mental blocks that I had in trying to understand it in the past.

I think that one of my main challenges has been understanding the difference between UQ() and eval_tidy().

UQ() and the !! operator unquote their argument. It gets evaluated immediately in the surrounding context.

But what UQ() does is different from evaluation... it still requires the expression to be evaluated somewhere down the line.

library(rlang)
y <- "shiny"
x <- expr(y)
x
#> y
UQ(x)
#> y
UQ(UQ(x))
#> y
eval_tidy(x)
#> [1] "shiny"

I guess I still find it a little bit confusing why double-unquoting an expression doesn't evaluate to "shiny" in the above. My mental model of tidy-eval functions is as a handshake, and I think that learning it requires understanding both sides of the handshake. You pass in an expression or quo with the trust that somewhere inside the function it will be explicitly evaluated -- whereas normally evaluation is automatic. It looks like your vignette is doing that through the discussion of the evaluation functions.

I think another stumbling block I had is that for basic dplyr programming, sym & syms, and even expr are more useful and easier to understand than quo. Again, it looks like the newer materials have moved in that direction, which is good.

1 Like

I wonder if it'd be better if the visible definition of UQ(), UQS() etc was something like

UQ <- function(x) {
  abort("`UQ()` cannot be called directly")
}

Also I now regret that we made it possible to write expr(rlang::UQ(x)) with the namespace qualifier. It suggests that UQ() is lexically scoped but it is not, it is a syntactic operator. Using the prefix doesn't really make sense, it is comparable to writing a model formula like this: lm(disp ~ stats::`+`(cyl, am), mtcars).

Maybe we should deprecate expr(rlang::UQ()) with a warning, and use your definition of UQ()/UQS() above.

This would be a breaking change, and it is used in CRAN packages (as revealed by a github search) but it might be worth it.

To answer your question about the difference between UQ() and eval_tidy(), they are completely unrelated. UQ() and !! work during capture (or quoting), not during evaluation. You can even supply expressions created by expr() to base::eval(), even if you used quasiquotation. By the time you call eval() the unquoting has already happened.

Edit: So the reason we have eval_tidy() is to support quosures, not quasiquotation.

Yes, I think I understand the difference at this point. I'm just trying to relate what were my past stumbling blocks. Both the documentation and the vignette say that the argument to UQ/!! is "evaluated", so you can see the confusion.

Maybe making it so you can't call UQ() at the top level would lead people to correct usage. But when you include it in a function call, it feels like a top-level call since from reading the code it isn't obvious that it is surrounded with a quoting function. I understand that it really IS inside of an enquo call from inside the function, but I believe that was a jump for me previously.

mydf <- tibble(col1 = "a", col2 = "b")
mycol <- sym("col1")
select(mydf, !!mycol)

This discussion is part of why I think the syntactic form !! is better than the functional form UQ(). Something very special happens when unquoting, and having special syntax to help reason about it is important.

it feels like a top-level call since from reading the code it isn’t obvious that it is surrounded with a quoting function

I agree, for this reason we have long term plans to let the IDE know whether a function is quoting or not. Quoting functions should have a special color or boldness, or at least be indicated by a tooltip or similar. Lisp IDEs do this to distinguish macros from functions and we have a similar problem in R.

1 Like

Nice job, I found it very well explained. Still a few missing words/typos and some sentences that could be made clearer, such as

The issue of referential transparency to do with the difficulty of passing contextual variables in order to vary the inputs of quoting functions.

but it's a great introduction. What I find missing is the link with base R. When I was starting out with R about 10 years ago I ran into such issues with subset, without knowing any of this terminology, and ended up "inventing" a combination of quote, bquote, eval to achieve this type of behaviour.

a <- 90
cond1 <- quote(mpg > 21 & hp > a)
cond2 <- quote(drat > 4)
cond <- bquote(.(cond1) & .(cond2))
subset(mtcars, eval(cond))

It would be nice to see this kind of parallel with the pure base R functions (as.name, substitute, eval, evalq, environments, etc.) to get a sense of where tidyeval differs (and therefore its necessity), but also how it solves such problems more elegantly with a robust and consistent framework that was (presumably) missing from base R.

This is particularly important, I think, if tidyeval is to become accepted by old-school R users, rather than forking into its own language (a serious concern nowadays, generating tensions I believe).

A couple of things that I find frustrating:

  • the non-standard syntax: how does it even work? !! and !!! are valid R syntax (if a bit useless) for double and triple negative, so what happens when the parser sees this in the wrong context? Do we get a boolean value, or a local evaluation? I tried, for curiosity, to place !! inside a function's signature;
what <- function(x = !!a) x
a <- "this"
what()

these kinds of interaction with the original language are bound to happen (even if this is a contrived example), and I believe it's a source of confusion.

  • Is there any hope to get new symbols introduced into base R (parser), in the long term?

  • we've seen a wide variety of short-lasting experiments over the past few years in various packages such as dplyr; is there a clear sense that this new paradigm / notation will last? It's quite frustrating to write code that breaks after just a few months/years because the authors have completely changed the syntax. I appreciate that all of this is work in progress, and no-one is forced to use any of these packages, but perhaps some drastic changes should justify creating an entirely new package (dplyr2) so that the old syntax survives.

3 Likes

Please please do publish those blog posts and shoot a link.
I'm ready to change my mind as soon as I see examples thar show how easy it is to do some cool stuff without having to understand the magic under the hood.

1 Like

Another small thing that I'm slowly realising is that I'm coming into tidyeval with some bad habits.

For example, as someone who has done an analysis with one column, then thought, "Actually, I'd like to repeat that analysis with ten other columns," my first instinct is still to reach for a for loop. The programming vignettes appear to start with the assumption that you're going to shift the block of analysis into a function and then pass a list of columns into that function.

That's probably the smarter way to do things, and someone who's been trained tidyverse first might go that way first, but it's not the instinctual way to do it for many users. I think those users, like me, have probably read these vignettes and thought, "But what if I don't want to write a function at all?" Addressing the ultimate problems that writing a function might solve would help a lot.

5 Likes

@rensa makes an excellent point that the use of functions (or using purrr in some form) are key here.

I am probably guilty of trying to understand tidyeval before making functions a natural part of the workflow. When the name of an input column changes I tend to either just rename it at the start of the script or do a find/replace instead of thinking how to use the column names as inputs to functions.

2 Likes

I feel a little bad reviving an aging thread, but I was inspired in part by this thread to take up the rstats blogging challenge again. Over the Thanksgiving break I wrote up two blog posts about quasi-quotation/tidy evaluation.

First one provides a way of thinking about quasi-quotation through an analogy to re-writing recipes: http://blog.jalsalam.com/posts/2017/quasi-quotation-as-meta-recipe/

Second one explores a few use cases of quasi-quotation: http://blog.jalsalam.com/posts/2017/quasi-quotation-applications/

As you will see from my default blogdown theme, I'm still a newbie to all this, but hopefully there is something useful to others here learning rlang.

12 Likes

@jalsalam I've only looked at the 2nd one so far and I really really love your approach, you're making me reconsider my position. Thanks a lot for sharing.
I have a question (2 actually) about the sym function. Is that from rlang ? And is there a more basic equivalent ?
Gonna read the 1st one now.
Edit: Done. Congrats, I have no choice but to dig deeper now as I need to understand the diff between quo, expr, sym, etc.
I salute your vulgarization skills!

1 Like

Good to know it makes sense to you. In answer to your question, yes, sym is from rlang. While some of the tidyeval functions are also exported by dplyr, sym is not. I think that for dplyr-programming, sym is the most basic because unlike expr and quo it can only store a single variable symbol. It takes a string, which makes it easy to use with shiny as in the applications post.