Why put + and %>% at the end of lines?

The general practice is to put + at the end of the line when creating ggplots, and %>% at the end of the line when using the pipe.

ggplot(data = mpg, aes(x = displ, y = hwy)) +
    geom_point() +
    stat_smooth()

delays <- flights %>% 
    group_by(dest) %>% 
    summarise(distance = mean(distance))

However, this makes commenting out particular lines harder. If I comment out the last line in either example, I'll get an error because the + or %>% is just "hanging out there".

I'd rather do it the way below. This lets me comment out lines as I wish, and to me seems aesthetically more pleasing (like a bulleted list).

ggplot(data = mpg, aes(x = displ, y = hwy))
    + geom_point()
    + stat_smooth()

delays <- flights
     %>% group_by(dest) 
     %>% summarise(distance = mean(distance))

Yet these examples throw errors.
Why won't R let me do it either way?
And, if it must be one way, what are the reasons for the trailing + instead of the leading +?

3 Likes

Because R doesn't require an explicit statement terminator (like the semi-colon in C/C++/many other languages), the R interpreter/compiler will determine that a statement is complete whenever there is a line that completes the statement. So, the following is a valid R statement:

delays <- flights

During execution, after reaching the end of that line, R will stop and execute that complete statement. It then reaches the next line

    %>% group_by(dest)

That is not a valid statement (and can't be turned into such by additional code), since %>% is a binary operator (along with +) and requires code on both sides of it.

Changing that so that allowing leading binary operators to continue statements would require changing R to look ahead to subsequent lines to see if the statement is being continued. While that would allow your preferred style, I'm sure it would have some problems. Even knowing little-to-nothing about the interpreter inner workings, the following would be highly ambiguous:

x <- 1
-1

Both lines are valid statements, but the negation operator - could also be subtraction if you looked ahead.

Edit: If it's a problem on a regular basis while you develop code, you could end your %>% chains with {.}:

delays <- flights %>%
    group_by(dest) %>%
    #summarise(distance = mean(distance)) %>%
    {.}

That last statement is basically an identity operator for pipes and will not change the output. I'm not sure what the ggplot2 equivalent would be offhand.

8 Likes

I'm planning to write a thing about identity elements for ggplot2, actually. geom_blank() or NULL do the trick.

3 Likes

You can also pipe into I or identity—which similarly terminate the pipeline without changing the output—to debug:

library(dplyr)

nycflights13::flights %>%
    group_by(dest) %>%
    #summarise(distance = mean(distance)) %>%
    I
#> # A tibble: 336,776 x 19
#> # Groups:   dest [105]
#>     year month   day dep_time sched_dep_time dep_delay arr_time
#>  * <int> <int> <int>    <int>          <int>     <dbl>    <int>
#>  1  2013     1     1      517            515         2      830
#>  2  2013     1     1      533            529         4      850
#>  3  2013     1     1      542            540         2      923
#>  4  2013     1     1      544            545        -1     1004
#>  5  2013     1     1      554            600        -6      812
#>  6  2013     1     1      554            558        -4      740
#>  7  2013     1     1      555            600        -5      913
#>  8  2013     1     1      557            600        -3      709
#>  9  2013     1     1      557            600        -3      838
#> 10  2013     1     1      558            600        -2      753
#> # ... with 336,766 more rows, and 12 more variables: sched_arr_time <int>,
#> #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> #   minute <dbl>, time_hour <dttm>
4 Likes

Technically, I adds the class AsIs to the object. That probably won't be a problem in any well-written code, but identity seems marginally safer (though more verbose).

1 Like

Thanks! The x <- -1 example really makes it clear what the issue is. The workarounds with {.} and geom_blank() are really helpful - I will definitely incorporate those.

Yes the {.} is a super handy to let you comment out any of your lines, including the last ‘proper’ one, without having to delete the pipe symbols off the end of a line.

@tjmahr has written this post now, and I read it and it's pretty darn great! :+1: @cricketbird, even though your question's been answered (@nick's such a champ on here, :trophy:), you might find it interesting nonetheless!

6 Likes

Thanks for the blogpost (and thanks to @mara to give us a heads-up).

FTR, I've been using and empty theme() call at the end of a ggplot2 statement (and had been using opts() before for the purpose of being able to outcomment the last (productive) step in a statement without breaking my code. Worked for me, haven't had problems so far. If someone has any objections to that, please pray tell.

I only recently found out that identity() at the end of a pipeline does the same thing for me, and was delighted!