TL;DR: it's actually a pretty advanced topic to understand what's going on, and the solution is not satisfying. If you're a beginner, my best advice might be to just avoid mixing these pipes for now.
Understanding what's happening
What can be helpful is to use quote()
to see how R parses the expression:
quote(df %<>% group_by(sequenced_case) |>
mutate(seq_id = row_number()))
#> mutate(df %<>% group_by(sequenced_case), seq_id = row_number())
quote(df <- df %<>% group_by(sequenced_case) |>
mutate(seq_id = row_number()))
#> df <- mutate(df %<>% group_by(sequenced_case), seq_id = row_number())
Created on 2023-07-24 with reprex v2.0.2
So, let's start with the first case. When you have X |> Y()
, this get automatically replaced by Y(X)
before anything is evaluated.
So:
df %<>% group_by(sequenced_case) |>
mutate(seq_id = row_number())
is equivalent to:
{df %<>% group_by(sequenced_case)} |>
mutate(seq_id = row_number())
or
X |>
mutate(seq_id = row_number())
where X
is df %<>% group_by(sequenced_case)
.
And indeed it gets replaced by:
mutate(X, seq_id = row_number())
The reason this works like that is that R replaces the pipe before it even tries to evaluate the expression. Indeed, you will note that in my reprex, I did not load {magrittr}
, so the code itself can not run! Let's make it even more obvious, with a totally meaningless function name:
quote(
a |> hfgfdsifd()
)
#> hfgfdsifd(a)
a |> hfgfdsifd()
#> Error in hfgfdsifd(a): could not find function "hfgfdsifd"
Created on 2023-07-24 with reprex v2.0.2
So, this is the difference between the base R pipe |>
and the magrittr pipe %>%
, while %>%
is a function, |>
is replaced before anything else happens.
Now to your second case:
df <- df %<>% group_by(sequenced_case) |>
mutate(seq_id = row_number())
this can be rewritten:
X |> mutate(seq_id = row_number())
where X
is df <- df %<>% group_by(sequenced_case)
, so the whole thing gets rewritten with:
mutate(X, seq_id = row_number())
Solutions
OK now we understand (I hope) why the native pipe behaves like that, but obviously that's not what you mean. So how do you make clear what you want?
You can tell R how to organize expressions using {}
. So, a natural approach is to try:
df %<>% {
group_by(sequenced_case) |>
mutate(seq_id = row_number())
}
We can check its parsing by R:
quote(
df %<>% {
group_by(sequenced_case) |>
mutate(seq_id = row_number())
}
)
#> df %<>% {
#> mutate(group_by(sequenced_case), seq_id = row_number())
#> }
So here the association is correct, the group_by()
is indeed inside the mutate()
.
However, if you run this, it fails with error message object 'sequenced_case' not found
. Indeed, from the point of view of %<>%
, the right expression is the function{
, so it will fail to pass df
as the first argument to group_by()
. One solution is then to pass the arguments along ourselves, for that we need a function:
df %<>% {\(.df)
.df %>% group_by(sequenced_case) |>
mutate(seq_id = row_number())
}()
which does what you want.
Conclusions?
If this looks disappointing, I'm with you! But I don't think there is any easier solution (please let me know if you can think of one!)
It is at a very fundamental level the consequence of design decisions, that has been a reproach to the base R pipe when it came out, and I'm afraid the solution is to not mix the base pipe and the magrittr exotic pipes, or be ready for more complex code.
And yes, we could wish that the designers of the R language had avoided this type of confusing situations, but that's what happens with living languages: the magrittr pipes are relatively recent, the |>
pipe even more, all of this is the result of (very clever) humans trying their best without knowing the future.