Symbols that do not represent objects, why?

On another topic When non-standard should evaluation be used and why? @nick mentioned that using symbols in a function argument that do not represent objects is now discouraged in the tidyverse.

Why is this? Will that capability be removed from the tidyverse in the future?

What about the case where the symbols are never meant to represent an object, for example a hypothetical function call like:

hfunc(source, a:1:q:z)

in a problem domain where a syntax like a:1:q:z is commonly used to specify a pattern, a process, or whatever.

If only symbols that represent objects are allowed the function would need signature something like

hfunc(source, c("a", "1", "q", "z"))

which would really be clumsy for a domain expert used to a:1:q:z .

BTW the enquo and friends are quite happy to parse and argument like a:1:q:z ...

I think that the tidyverse opens R up to domain experts who are not statisticians but just want to poke at their data, do simple stats and make plots (maybe with two Y axes :grinning: ) and the recommendation (and maybe prohibition??) that symbols must represent objects closes this off to them a bit.

Thanks,
Dan

1 Like

I believe it's because those symbols are usually infix functions, so notation like

library(tidyr)

mtcars %>% gather(var, val, mpg:carb, -disp)

is syntactically problematic because

  • var and val refer to object that haven't been created yet (the issue in the linked question) which in normal contexts would thrown an error, and
  • : and - are functions, so the same code could be called in a different context with different results (the issue you're asking about, if I understand), e.g.
mpg <- 1
carb <- 10
disp <- 5

mpg:carb
#>  [1] 1  2  3  4  5  6  7  8  9 10

-disp
#> [1] -5

and for users, remembering that symbols mean different things in different contexts can become onerous, e.g. when new users try to figure out what symbols like ~, +, ., *, :, -, etc. mean in a model formula. To some extent we're willing to put up with it—there's not a great alternative for formulas, and tidyr notation is really handy—but as such contexts multiply, so does the cognitive load.

I'm thinking more in terms of a DSL so that expert in a domain can have a syntax that is familiar to them and maybe exists in spec's for that domain, rather than an alternative to an existing syntax just because that alternative is cleaner i.e. less program ceremony.

I think @nick mentioned that there are already cases that violate the advice to only use symbols that refer to objects already exist but going forward that should not be the case.

Somewhere someone thought up the rule of no symbols for objects that don't exist. I'm sure they had some concrete reasoning for it... it would be good to know what that was.

It's a guideline, not a rule, exactly. It's a good idea, because the expectation is that R goes to find the object to which a name refers and throws an error if it doesn't find anything. Breaking that expectation increases the learning curve and cognitive load for users, because they have to be aware of how code will be interpreted in a particular location. In some contexts that tradeoff may be worthwhile, but it shouldn't be done lightly.

I think it's a good guideline too. I'd be more worried about it resolving the symbol and introducing a silent error more than throwing an exception... at least with an exception you know something is wrong.