Why can't ggplot2 use %>%?

I think it's worth unpacking this question into a few smaller pieces:

  • Should ggplot2 use the pipe? IMO, yes.
  • Could ggplot2 support both the pipe and plus? No
  • Would it be worth it to create a ggplot3 that uses the pipe? No.

Should ggplot2 use the pipe?

The first implicit question is should ggplot2 use the pipe? I think the answer is yes:

  • I think the pipe is absolutely the right interface. It is a consistent principle that applies in many more situations, and because it's just syntactic sugar for function composition, you can still compose small pieces in other ways.

  • Switching from %>% to + is a frequent source of errors (including for me!)

  • The pipe avoids the poor match of the semantics of addition to ggplot2. You usually expect that x + y equals y + x and that x + (y + z) equals (x + y) + z. Neither of these are true (in general) for ggplot2.

  • I think it's fine to have a pipe-y interface based around nouns instead of verbs. keras is a good example - I don't think there would be any significant benefit to renaming (e.g.) layer_dense() to add_layer_dense().

(@rensa points out one nice feature of + interface is that you can add multiple components by putting them in a list. But magrittr has an equivalent technique: my_geoms <- . %>% geom_point() %>% geom_line() %>% geom_smooth(). And I think that's an improvement because it uses ideas that can be applied in more contexts.)

As an interesting historical anecdote, ggplot (the precursor to ggplot2), was written in a function style that could have used the pipe (if the pipe had existed). To explore this idea little bit, I bought ggplot back to life as ggplot1:

library(ggplot1)

mtcars %>% 
  ggplot(list(x = mpg, y = wt)) %>% 
  ggpoint()

Could ggplot2 support both + and %>%?

So if ggplot2 should use the pipe, could it? Would it be possible to allow both + and %>%?

I'm pretty certain the answer is no:

  • The first two arguments to all the geoms are currently mapping and data.
    For the pipe to work, the first argument would need to be plot.

  • It would be possible to change the definition of the pipe specially to make
    it work with ggplot2, but that is unappealing because it would require
    changing a general tool to support a specific package.

  • It's almost certainly possible to use some deep metaprogramming magic to
    tell when the pipe is being used and somehow offset every argument one
    place over. This is likely to be hard to implement, fragile, slow,
    and hard to document.

Would it be better to create ggplot3?

If we can't make the pipe work with ggplot2, maybe it's time for ggplot3? ggplot3 could behave identically to ggplot2 in every way, except that it would compose plots using %>% instead of +. This would solve the pipe problem but would come some major downsides:

  • ggplot3 would need substantial (if fairly formulaic) changes to almost
    every function. This would be a lot of work.

  • What would happen when someone reported a bug in ggplot2? Would I fix it
    only in ggplot3 and require users to upgrade? That seems unfair to ggplot2
    users, so for every change, I'd need to make it simultaneously to ggplto2
    and ggplot3, basically doubling all future development work.

  • Similarly, ggplot3 would create a fork in all other documentation (e.g.
    stackoverflow and the ggplot2 book): you wouldn't be able to immediately
    apply ggplot2 anwers to ggplot3, and new answers created for ggplot3 wouldn't
    immediately apply to ggplot2.

Overall, I think making this change just to use the pipe is not worthwhile.

45 Likes