Why should we use pipe to simplify our code?

Hi guys,
I don't understand why should we use pipe to simplify our code.
I think it makes the code looks more complexe, and it's more difficult to read. Can someone explain it and give some examples, please?
Thank you all !!!

assuming you have the data in


is this easier to understand ...

  ), year, month, day),
  rank(desc(arr_delay)) < 10

....than this ?

flights %>%
  ) %>%
  group_by(year, month, day) %>%
  filter(rank(desc(arr_delay)) < 10)

It's a choice, not a requirement. If you don't like pipes, simply don't use them. Some agree with your view, others don't.


I agree that using pipes vs. not is a choice (unless you're e.g. in a class where one style is required over another!)

The argument that I've most frequently heard for using pipes, and the one that I personally find to ring true, is that a pipe puts operations in the order that you're doing them. Using a pipe brings your code closer to natural language syntax so that you can "read" it more easily. By contrast, without a pipe, you often end up "nesting" functions, so you effectively have to read the code from the inside out. That doesn't feel very natural.

Here's an example. I'll use the mtcars dataset, which comes built into R. Let's say I want to compare how many gallons of gas I would need for a 75 mile trip among 4-cylinder cars.

With the piped code, you can roughly translate %>% into English as "and then" as you read the code.

library(dplyr) # load dplyr for the pipe and other tidy functions
data(mtcars) # load the mtcars dataset

df <- mtcars %>% # take mtcars. AND THEN...
    filter(cyl == 4) %>% # filter it to four-cylinder cars, AND THEN...
    select(mpg) %>% # select only the mpg column, AND THEN...
    mutate(car = row.names(.), # add a column for car name and # gallons used on a 75 mile trip
    gallons = mpg/75)

I find that pretty intuitive to read. As an alternative, here's how I might approach the same problem without using a pipe.

mtcars$car <- row.names(mtcars) # add a car name column based on the row names
df <- as.data.frame(mtcars[mtcars$cyl == 4, c("car", "mpg")]) # filter to 4-cylinder cars and select the mpg and car name columns
df$gallons <- df$mpg/75 # calculate number of gallons for a 75-mile trip

Now of course, what I just wrote might be somewhat biased, because I'm much more comfortable using pipes than not. There could be a much better way to do it without a pipe. But personally, I don't like having to read lines like df <- as.data.frame(mtcars[mtcars$cyl == 4, c("car", "mpg")]) because there are so many steps nested. I also don't like that the name of the data frame has to be repeated (mtcars$cyl), whereas with a pipe and filter(), you can just type the column name without the $.

I hope that's a helpful illustration of why the pipe can be useful and can help make code more readable. But still, I'm not here to evangelize, and if non-piped code works better for you, then that's completely fine! It's a personal preference. You might end up having a leg up when it comes to function-writing and package development, where using pipes sometimes makes things a little harder.



I think what you've shown here is an example of dplyr vs base R, rather than pipe vs. not. Base R will soon include its own pipe operator |> incidentally.

A non-piped version of your dplyr example would be:

df2 <- mutate(
    filter(tibble::rownames_to_column(mtcars, "car"), cyl == 4), 
    mpg, car
  gallons = mpg/75

I cheated a bit with the sequence of converting the rownames to a column, but hopefully the logic is clear.

1 Like

Thank you very much! Now I can find the difference with pipe and without pipe.

1 Like

Here you can find a related discussion:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.