using `:=` for enhanced assignments, bad idea ?

recommendations

#1

Hi all,

I'm thinking of using the following function for assignments that require to repeat the variable name in lhs and rhs.

Pipes are good to avoid it in many case and that's one of the main reason why they're appreciated so much but in regular simple assignments we can't leverage them.

`:=` <- function(e1,e2) {
  mc <- match.call()
  mc[[1]] <- quote(.Primitive("<-"))
  mc[[3]] <- eval(substitute(substitute(e2, list(µ = mc[[2]])),list(e2=mc[[3]])))
  eval.parent(mc)
}

some_long_variable_name <- 3
some_long_variable_name := µ +2
some_long_variable_name
# [1] 5

it's basically like some_long_variable_name %<>% {. +2} but simpler and no need to attach magrittr

I thought of using . but it would mess with pipe chains, so went with µ, .. would work too.

Would it be robust ? would it conflict with tidyverse or data.table's use of := ? In interactive use only or would it be safe to use in functions or even package code (as an unexported helper) ?

I know I'm in shady territory here so I will appreciate some feedback.


#2

Let me get my crotchety old man reaction out of the way: I don't like this.

Now, let's see if it causes trouble for rlang or data.table.

a <- "hello"
a := paste(µ, "world")
a
# [1] "hello world"

library(rlang)
dots_list(a := "dog", !!a := "cat")
# $`a`
# [1] "dog"
# 
# $`hello world`
# [1] "cat"

library(data.table)
data.table(x = 1)[, a := x + 2][, c("aa") := a + x][]
#    x a aa
# 1: 1 3  4

So far, so good. I've never used rlang myself, so it probably needs more testing for common cases. You could just rename it to %:=% to avoid name overlaps.


Personal opinon below, feel free to ignore

Why I don't like it: non-standard evaluation (substitute, quote, eval, and playing with environments) adds a lot of complexity and chance for silent errors. It should get you something very valuable and not otherwise achievable in return.

A much simpler fix is to use a temporary variable to save keystrokes. It's very similar to what you're doing with µ.

some_long_variable_name <- 3
some_long_variable_name <- {
  .. <- some_long_variable_name
  .. <- .. +2
}
some_long_variable_name
# [1] 5

This way requires one more instance of some_long_variable_name than yours. But it uses basic R syntax.


#3

Many thanks for your feedback. I thought about %<-% but the operator precedence is different and I'll need parentheses in my above example, so this adds 4 characters %%() and makes it more awkward than I'd like it to be.

About the complexity and silent errors, that's a thing that worries me indeed. I think my code does some very straightforward editing of the call and executes it right back in its original environment (i can even do things like l[[1]]$z := ... or use fun<- functions like levels(x) := ... ) so I don't see what could go wrong, but I could be mistaken...

I started to use . a lot as a temp variable and I like it a lot, but there is also many single operations where you want to increment a variable or change one element of an element of a list, and a compact syntax would make things more readable without cluttering the workspace.


#4

This is just my personal opinion but variables are best to be immutable unless there is a really compelling reason (e.g. out of memory.) Especially in an interactive language like R, it is far too easy to accidentally run a line of mutation twice. In your example, some_long_variable_name could accidentally become 7 by human mistake.

If we choose to mutate a variable, it is best for the mutation to be idempotent, such that re-application produces the same result as the first application. One example may be adding a column to a data.frame based on other existing columns, or clean up a column by replace NA with some default values. These operations are likely idempotent. On the other hand, if I replace a column by converting the unit from km to m by multiplying 1000, such mutation would not be idempotent.

Sorry for being slightly off topic but in my experience mutable variables are among top reasons for unreproducible code and they introduce some mental tax when some one else reads your code.

p.s. Doesn't RStudio or your IDE of choice provides robust code completion? I like descriptive long variable names as long as IDE does a decent job of auto completion so that I rarely types more than a few characters. You can also lower the delay of auto-completion and character requirements if you find auto-completion not fluid enough.

image


#5

You raise a good point and I mostly agree.

Usually I'd rather create a new variable or use piped operations rather than overwriting a variable, and if I need to I will create a temp variable that I'll usually name . or end it with _, and these will be used in small blocks.

Some exceptions where it's not possible could include for loops of the kind that can't be replaced by *apply or Reduce calls.

Some other cases might be possible to rewrite without overwriting the variable but might just be easier to do so, especially in interactive use.

I like descriptive names but I also like compact code, and for me it's more about reading/cognitive load than typing.


#6

Some feedback after a few weeks of use.

It seems robust enough as I haven't encountered any real issue yet, and as helped me quite a bit to write more readable code.

In my opinion it's not just about typing laziness but it helps being conceptually cleaner, if you're growing a data frame for example you might not want to see a copy of its name on the rhs, the := operator tells you that you're overwriting an object or growing it, and if you want to change its name you'll change it on the lhs only, i find it more satisfying.

Debugging hasn't been any more difficult, just you cannot execute the right hand side alone, but this is the same with %<>%

I made it into a package that you can find here :

Install with : devtools::install_github("moodymudskipper/dotdot").