What is the reason that ggplot2 can't/doesn't use the pipe operator?
I know it can be done with ggpipe, but it's not something that's part of ggplot2 itself.
What is the reason that ggplot2 can't/doesn't use the pipe operator?
I know it can be done with ggpipe, but it's not something that's part of ggplot2 itself.
I guess one (perhaps small) reason is that %>%
, as it's currently defined, only works with a function on the right-hand side. If you're overloading the +
operator do whatever you want (or creating a custom one like %+%
), you can allow stuff like adding a list of ggplot components:
my_geoms = list(
geom_point(),
geom_line(),
geom_smooth())
ggplot(my_data) + my_geoms
I don't think this would work with the pipe %>%
unless you redefined it to work that way—and that could maaaybe break other code?
I mean, it can, if you really want:
`%>%` <- function(lhs, ...){
pipes <- match.call() # available from methods
UseMethod("%>%", lhs)
}
`%>%.default` <- function(lhs, ...){
# hacky, but...works
with(list(`%>%` = magrittr::`%>%`), lhs %>% (pipes[[3]]))
}
`%>%.gg` <- ggplot2:::`+.gg`
1:3 %>% sum %>% seq(1, .) %>% {. > 3}
#> [1] FALSE FALSE FALSE TRUE TRUE TRUE
library(ggplot2)
mtcars %>%
ggplot(aes(mpg, wt)) %>%
geom_point() %>%
geom_smooth(method = 'lm')
Whether it's a good idea is another matter. ggplot functions are nouns, so adding really makes more sense than piping, which is for passing a noun into a series of verbs. Newer, piped graphics packages frequently use verb names (e.g. plotly::add_lines
) so pipes make more sense.
^ an excellent descriptor of how I feel about any of my (very briefly-lived) attempts to use a pipe in ggplot2.
Hadley added a more informative error message yesterday for those moments when some of us inevitably make the mistake of piping ggplot.
My understanding is that ggplot2 was created to allow for more readable, composable R code. If the pipe existed, ggplot2 would never have split from the original ggplot package:
The way that I have thought of it is that both of these operators, %>%
and +
, are each doing different types of operations. For %>% we are passing values from the left into a function on the right. For + in ggplot we are adding elements to a plot. Conceptually this is not the same as passing previous elements of a plot into a geom function.
I think it's worth unpacking this question into a few smaller pieces:
The first implicit question is should ggplot2 use the pipe? I think the answer is yes:
I think the pipe is absolutely the right interface. It is a consistent principle that applies in many more situations, and because it's just syntactic sugar for function composition, you can still compose small pieces in other ways.
Switching from %>%
to +
is a frequent source of errors (including for me!)
The pipe avoids the poor match of the semantics of addition to ggplot2. You usually expect that x + y
equals y + x
and that x + (y + z)
equals (x + y) + z
. Neither of these are true (in general) for ggplot2.
I think it's fine to have a pipe-y interface based around nouns instead of verbs. keras is a good example - I don't think there would be any significant benefit to renaming (e.g.) layer_dense()
to add_layer_dense()
.
(@rensa points out one nice feature of +
interface is that you can add multiple components by putting them in a list. But magrittr has an equivalent technique: my_geoms <- . %>% geom_point() %>% geom_line() %>% geom_smooth()
. And I think that's an improvement because it uses ideas that can be applied in more contexts.)
As an interesting historical anecdote, ggplot (the precursor to ggplot2), was written in a function style that could have used the pipe (if the pipe had existed). To explore this idea little bit, I bought ggplot back to life as ggplot1:
library(ggplot1)
mtcars %>%
ggplot(list(x = mpg, y = wt)) %>%
ggpoint()
+
and %>%
?So if ggplot2 should use the pipe, could it? Would it be possible to allow both +
and %>%
?
I'm pretty certain the answer is no:
The first two arguments to all the geoms are currently mapping
and data
.
For the pipe to work, the first argument would need to be plot
.
It would be possible to change the definition of the pipe specially to make
it work with ggplot2, but that is unappealing because it would require
changing a general tool to support a specific package.
It's almost certainly possible to use some deep metaprogramming magic to
tell when the pipe is being used and somehow offset every argument one
place over. This is likely to be hard to implement, fragile, slow,
and hard to document.
If we can't make the pipe work with ggplot2, maybe it's time for ggplot3? ggplot3 could behave identically to ggplot2 in every way, except that it would compose plots using %>%
instead of +
. This would solve the pipe problem but would come some major downsides:
ggplot3 would need substantial (if fairly formulaic) changes to almost
every function. This would be a lot of work.
What would happen when someone reported a bug in ggplot2? Would I fix it
only in ggplot3 and require users to upgrade? That seems unfair to ggplot2
users, so for every change, I'd need to make it simultaneously to ggplto2
and ggplot3, basically doubling all future development work.
Similarly, ggplot3 would create a fork in all other documentation (e.g.
stackoverflow and the ggplot2 book): you wouldn't be able to immediately
apply ggplot2 anwers to ggplot3, and new answers created for ggplot3 wouldn't
immediately apply to ggplot2.
Overall, I think making this change just to use the pipe is not worthwhile.
The development version of wrapr can also use S3 dispatch to pipe through ggplot2. It also has a "cut in front" dispatch on the right method for treating nouns/objects as if they were functions.
Well, could you use the opportunity to incorporate into a new core product some of the best of the extensions?
With well over 100 gg.. packages it might help if the best were included in ggplot3
Stealing the best ideas from extension packages is not a good way to grow an ecosystem!
Why mention stealing?!
If you offered authors the opportunity to have their code incorporated and recognized - rather than just being one of 100's of extensions - they might be flattered
On a mini-scale, it is like the general concern that valuable R packages are being lost in the 12,000+ out there
Obviously sites like the extensions gallery are valuable.
Maybe, experienced users of ggplot extensions could come up with lists of their favourites and a kind of ggplot galaxy could evolve
All of the things that you describe can already happen. I don't think there's any advantage to making me the sole gatekeeper.
Certainly not. You have enough on your plate already. Have a great conference!
Interpreting the +
as "Another Layer in the Chart" (Which does not depend on any specific order); and the pipe %>% as "then" for passing objects to a function, I actually like the +
symbol from the ggplot2 family
Except order does matter when adding layers to a plot. So perhaps you prefer %>%
after all?
Note: If you like pipes (and formulas), you might take a look at ggformula
, which provides a formula interface (somewhere between what lattice
uses and what ggvis
uses) to ggplot2
and uses %>%
rather than +
. Version 0.6.2 just went to CRAN this week.
FWIW, the arguments against ggplot3, especially those below, are almost exactly the arguments users of graphics and lattice used against ggplot2. They also have the same structure as arguments that lead to major companies being displaced by newer and more agile innovators in their industry. I think we should expect that, at some point, someone will release a graphics library that uses pipes, and eventually ggplot2 will be displaced by such a library. Whether that library germinates in R, python or Julia remains to be seen, but we can bet that it won't be directly part of the tidyverse because of the overhead cost to the developers. That's a shame, but there's no easy way out of the competency trap.
I think they're the sort of arguments that prevent things like the python 2 -> python 3 mess. If you're going to require a bunch of people to rewrite a bunch of code there needs to be a big payoff - stylistic consistency isn't big enough.
I guess I'm confused as to why this doesn't satisfy?
library(ggplot2)
library(dplyr)
data.frame(x = rnorm(10), y = rnorm(10)) %>% ggplot(aes(x,y)) + geom_point()
Oh, it does— don't you worry!