How do we feel about `->` within a pipeline?

Generally, if I want to save something into an object when using the pipe, I do it at the beginning. For example:

library(tidyverse)
cty_mpg <- mpg %>% 
  group_by(cyl) %>% 
  summarize(cty_mean = mean(cty))

I read a blog post a while back (can't find it now, maybe someone else can?) that advocated using -> instead at the end of the pipeline. So the above example would turn into:

mpg %>% 
  group_by(cyl) %>% 
  summarize(cty_mean = mean(cty)) ->
  cty_mpg

This seems more natural because the last step after building up the pipeline is to store the result. But, as much as I think it makes logical sense, I haven't adopted this style with my own code yet because it just feels.... weird. I don't know, maybe I'm just stuck in my ways.

So I was hoping to get others thoughts. Which approach do you prefer and why?

Assignment is way too important to be overlooked. Whenever my "code paragraph" starts with assignment operator, I know that the new object is created. When it is just the "pipe", I assume the result is either printed to console or the plotted at the end of the pipe, i.e. that I can not refer to the result of this calculation later in the script.

Explicit <- helps me keep my code tidy and organized.

5 Likes

For myself, I advocate for the left assignment operator. When putting it at the start of a pipeline, I typically put it on its own line:

cty_mpg <- 
  mpg %>% 
  group_by(cyl) %>% 
  summarize(cty_mean = mean(cty))

My preference for this is based solely on the fact that it isolates that assignment, with everything pertaining to that assignment being indented underneath it. Using the right assignment loses some of that clarity. While the code below is equivalent, it leaves the impression that all of the actions taken belong to mpg

mpg %>% 
  group_by(cyl) %>% 
  summarize(cty_mean = mean(cty)) ->
  cty_mpg

Another concern I have with right assignment (and this may be unfounded) is that I'm under the impression that right assignment operators are not common in the programming world. One of my students (pursuing a degree in computer science) would make use of the -> operator every chance he got because it was so different and intriguing. If right assignment is, indeed, a curiosity of R, then I would choose to avoid it based on the need for my code to be clear to programmers of other languages.

4 Likes

Thanks both, I like those thoughts, and I tend to agree. But I thought the latter was an interesting approach and wondered if others had adopted it. You provide good reasons to consider not adopting, however. (and, as I said, it feels somewhat unnatural to me anyway)

Although I don't use it I think -> is a neat option. If it's not an option in other languages then I find it all the better as one of R's quirks.

The indenting issue mentioned above looks to me the most serious from a code clarity point of view.

If I was going to use right assignment, I'd bring it back an indent level, just as people generally do with left assignment. But I usually use = or %<>%, so it's kind of a moot point for me :sweat_smile:

2 Likes

I generally avoid the right assignment for the reasons listed above. My main concern is readability of code, as I think it's helpful to see object names at the top of pipes used to create them.

1 Like

I use -> all the time. It is convenient for me for multiple reasons. For instance, if I need to split (or even just debug) a long pipeline I just need to make one local change - replace one of the %>% with -> result_so_far; result_so_far %>%. The final result of the pipeline (naturally I have -> result at the end) will not change. Other methods to debug pipelines like using %T>% or even a "state/writer monad", e.g. trace(%>%, tracer=...), to print out/store intermediate results also work but injecting a variable which can stay there as long as needed without disturbing anything seems much easier.

I like right pipe. You can even build pipelines with just right pipe!