Why can't ggplot2 use %>%?

mara · January 19, 2018, 11:27pm

What is the reason that ggplot2 can't/doesn't use the pipe operator?

I know it can be done with ggpipe, but it's not something that's part of ggplot2 itself.

rensa · January 19, 2018, 11:58pm

I guess one (perhaps small) reason is that %>%, as it's currently defined, only works with a function on the right-hand side. If you're overloading the + operator do whatever you want (or creating a custom one like %+%), you can allow stuff like adding a list of ggplot components:

my_geoms = list(
  geom_point(),
  geom_line(),
  geom_smooth())

ggplot(my_data) + my_geoms

I don't think this would work with the pipe %>% unless you redefined it to work that way—and that could maaaybe break other code?

alistaire · January 20, 2018, 3:32am

I mean, it can, if you really want:

`%>%` <- function(lhs, ...){
    pipes <- match.call()    # available from methods
    UseMethod("%>%", lhs) 
}
`%>%.default` <- function(lhs, ...){
    # hacky, but...works
    with(list(`%>%` = magrittr::`%>%`), lhs %>% (pipes[[3]]))
}
`%>%.gg` <- ggplot2:::`+.gg`

1:3 %>% sum %>% seq(1, .) %>% {. > 3}
#> [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE

library(ggplot2)

mtcars %>%
    ggplot(aes(mpg, wt)) %>% 
    geom_point() %>% 
    geom_smooth(method = 'lm')

Whether it's a good idea is another matter. ggplot functions are nouns, so adding really makes more sense than piping, which is for passing a noun into a series of verbs. Newer, piped graphics packages frequently use verb names (e.g. plotly::add_lines) so pipes make more sense.

mara · January 20, 2018, 2:03pm

^ an excellent descriptor of how I feel about any of my (very briefly-lived) attempts to use a pipe in ggplot2.

Hadley added a more informative error message yesterday for those moments when some of us inevitably make the mistake of piping ggplot.

github.com

tidyverse/ggplot2/blob/f61bfd620037b7e1c816129469b106443651a4dc/R/layer.r#L92


      
            warning("`show.legend` must be a logical vector.", call. = FALSE)
            show.legend <- FALSE
          }
          
          data <- fortify(data)
          if (!is.null(mapping) && !inherits(mapping, "uneval")) {
            msg <- paste0("`mapping` must be created by `aes()`")
            if (inherits(mapping, "ggplot")) {
              msg <- paste0(
                msg, "\n",
                "Did you use %>% instead of +?"
              )
            }
          
            stop(msg, call. = FALSE)
          }
          
          if (is.character(geom))
            geom <- find_subclass("Geom", geom, parent.frame())
          if (is.character(stat))
            stat <- find_subclass("Stat", stat, parent.frame())

dchilders · January 20, 2018, 4:42pm

My understanding is that ggplot2 was created to allow for more readable, composable R code. If the pipe existed, ggplot2 would never have split from the original ggplot package:

https://www.reddit.com/r/dataisbeautiful/comments/3mp9r7/im_hadley_wickham_chief_scientist_at_rstudio_and/cvi19ly/

dpuddeph · January 25, 2018, 1:08pm

The way that I have thought of it is that both of these operators, %>% and +, are each doing different types of operations. For %>% we are passing values from the left into a function on the right. For + in ggplot we are adding elements to a plot. Conceptually this is not the same as passing previous elements of a plot into a geom function.

hadley · January 29, 2018, 7:37pm

I think it's worth unpacking this question into a few smaller pieces:

Should ggplot2 use the pipe? IMO, yes.
Could ggplot2 support both the pipe and plus? No
Would it be worth it to create a ggplot3 that uses the pipe? No.

Should ggplot2 use the pipe?

The first implicit question is should ggplot2 use the pipe? I think the answer is yes:

I think the pipe is absolutely the right interface. It is a consistent principle that applies in many more situations, and because it's just syntactic sugar for function composition, you can still compose small pieces in other ways.
Switching from %>% to + is a frequent source of errors (including for me!)
The pipe avoids the poor match of the semantics of addition to ggplot2. You usually expect that x + y equals y + x and that x + (y + z) equals (x + y) + z. Neither of these are true (in general) for ggplot2.
I think it's fine to have a pipe-y interface based around nouns instead of verbs. keras is a good example - I don't think there would be any significant benefit to renaming (e.g.) layer_dense() to add_layer_dense().

(@rensa points out one nice feature of + interface is that you can add multiple components by putting them in a list. But magrittr has an equivalent technique: my_geoms <- . %>% geom_point() %>% geom_line() %>% geom_smooth(). And I think that's an improvement because it uses ideas that can be applied in more contexts.)

As an interesting historical anecdote, ggplot (the precursor to ggplot2), was written in a function style that could have used the pipe (if the pipe had existed). To explore this idea little bit, I bought ggplot back to life as ggplot1:

library(ggplot1)

mtcars %>% 
  ggplot(list(x = mpg, y = wt)) %>% 
  ggpoint()

Could ggplot2 support both `+` and `%>%`?

So if ggplot2 should use the pipe, could it? Would it be possible to allow both + and %>%?

I'm pretty certain the answer is no:

The first two arguments to all the geoms are currently mapping and data.
For the pipe to work, the first argument would need to be plot.
It would be possible to change the definition of the pipe specially to make
it work with ggplot2, but that is unappealing because it would require
changing a general tool to support a specific package.
It's almost certainly possible to use some deep metaprogramming magic to
tell when the pipe is being used and somehow offset every argument one
place over. This is likely to be hard to implement, fragile, slow,
and hard to document.

Would it be better to create ggplot3?

If we can't make the pipe work with ggplot2, maybe it's time for ggplot3? ggplot3 could behave identically to ggplot2 in every way, except that it would compose plots using %>% instead of +. This would solve the pipe problem but would come some major downsides:

ggplot3 would need substantial (if fairly formulaic) changes to almost
every function. This would be a lot of work.
What would happen when someone reported a bug in ggplot2? Would I fix it
only in ggplot3 and require users to upgrade? That seems unfair to ggplot2
users, so for every change, I'd need to make it simultaneously to ggplto2
and ggplot3, basically doubling all future development work.
Similarly, ggplot3 would create a fork in all other documentation (e.g.
stackoverflow and the ggplot2 book): you wouldn't be able to immediately
apply ggplot2 anwers to ggplot3, and new answers created for ggplot3 wouldn't
immediately apply to ggplot2.

Overall, I think making this change just to use the pipe is not worthwhile.

JohnMount · January 30, 2018, 1:43am

The development version of wrapr can also use S3 dispatch to pipe through ggplot2. It also has a "cut in front" dispatch on the right method for treating nouns/objects as if they were functions.

pssguy · January 30, 2018, 3:34am

Well, could you use the opportunity to incorporate into a new core product some of the best of the extensions?
With well over 100 gg.. packages it might help if the best were included in ggplot3

hadley · January 30, 2018, 2:07pm

Stealing the best ideas from extension packages is not a good way to grow an ecosystem!

pssguy · January 31, 2018, 3:48pm

Why mention stealing?!

If you offered authors the opportunity to have their code incorporated and recognized - rather than just being one of 100's of extensions - they might be flattered

On a mini-scale, it is like the general concern that valuable R packages are being lost in the 12,000+ out there
Obviously sites like the extensions gallery are valuable.

Maybe, experienced users of ggplot extensions could come up with lists of their favourites and a kind of ggplot galaxy could evolve

hadley · February 1, 2018, 4:44pm

All of the things that you describe can already happen. I don't think there's any advantage to making me the sole gatekeeper.

pssguy · February 1, 2018, 5:01pm

Certainly not. You have enough on your plate already. Have a great conference!

jtelleriar · February 2, 2018, 3:07pm

Interpreting the + as "Another Layer in the Chart" (Which does not depend on any specific order); and the pipe %>% as "then" for passing objects to a function, I actually like the + symbol from the ggplot2 family

rpruim · February 7, 2018, 2:06am

Except order does matter when adding layers to a plot. So perhaps you prefer %>% after all?

rpruim · February 7, 2018, 2:09am

Note: If you like pipes (and formulas), you might take a look at ggformula, which provides a formula interface (somewhere between what lattice uses and what ggvis uses) to ggplot2 and uses %>% rather than +. Version 0.6.2 just went to CRAN this week.

tomhopper · February 15, 2018, 8:33pm

FWIW, the arguments against ggplot3, especially those below, are almost exactly the arguments users of graphics and lattice used against ggplot2. They also have the same structure as arguments that lead to major companies being displaced by newer and more agile innovators in their industry. I think we should expect that, at some point, someone will release a graphics library that uses pipes, and eventually ggplot2 will be displaced by such a library. Whether that library germinates in R, python or Julia remains to be seen, but we can bet that it won't be directly part of the tidyverse because of the overhead cost to the developers. That's a shame, but there's no easy way out of the competency trap.

hadley · February 15, 2018, 8:54pm

I think they're the sort of arguments that prevent things like the python 2 -> python 3 mess. If you're going to require a bunch of people to rewrite a bunch of code there needs to be a big payoff - stylistic consistency isn't big enough.

d8aninja · February 16, 2018, 12:58am

I guess I'm confused as to why this doesn't satisfy?

library(ggplot2)
library(dplyr)

data.frame(x = rnorm(10), y = rnorm(10)) %>% ggplot(aes(x,y)) + geom_point()

mara · February 16, 2018, 12:19pm

Oh, it does— don't you worry!

Why can't ggplot2 use %>%?

Should ggplot2 use the pipe?

Could ggplot2 support both + and %>%?

Would it be better to create ggplot3?

Could ggplot2 support both `+` and `%>%`?