Two pipes: %>% and %<>%

Hi, I have got two pipes, the latter with %<>% is not working for me, nothing is happening apart
from printing (properly) on a screen, as a result I do not get a new variable called: seq_id and df is not saved.
Why is that ?
The former works as expected. I am on windows with R 4.3

df <- structure(list(Patient_ID = c(
  "1", "2", "3", "4", "5", "6", "7",
  "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
  "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29",
  "30", "31", "32"
), sequenced_case = c(
  "Yes", "Yes", "Yes", "Yes",
  "Yes", "Yes", NA, "Yes", NA, NA, "Yes", "Yes", "Yes", NA, "Yes",
  "Yes", "Yes", "Yes", "Yes", NA, NA, "Yes", "Yes", "Yes", "Yes",
  "Yes", NA, "Yes", "Yes", "Yes", "Yes", NA
)), row.names = c(
  NA,
  -32L
), class = "data.frame")


df <- df %>%
  group_by(sequenced_case) |>
  mutate(seq_id = row_number())

 df %<>% group_by(sequenced_case) |>
  mutate(seq_id = row_number())
library(“magrittr”)

and what do you mean by that ?
I had that library loaded beforehand of course, because otherwise it woud be throwing an error.

That second pipe %<>% is in library magrittr. It does two things at once. Firstly it pipes your data as usually and then assign the results to the data set

Eg:
A <- A %>% select(col1)

This equals to
A%<>%select(col1)

In my case it is not assigning, this is all my question is about why ?

This may sound silly, but how do you know it's not assigning. In the example you posted, shouldn't the results of the second assignment be unchanged from the first assignment?

Because in my environment pane df has got 2 columns still, instead of 3 as I suppose it should be.
This gives me 3 columns:

df <- df %>%
  group_by(sequenced_case) |>
  mutate(seq_id = row_number())

This gives me 2 columns (nothing changes in df in global environment):

df %<>% group_by(sequenced_case) |>
  mutate(seq_id = row_number())

Maybe this is related to my R or laptop. I do not know. Does both work for you ?

It looks like you're creating seq_id twice, and the second time just replaces the first.

What do you mean, I always start with fresh df creation.

Maybe I misunderstood. Here's the relevant part of your post.

Try using %>% instead of |>. Not sure what is happening.

df %<>% group_by(sequenced_case) %>%
  mutate(seq_id = row_number())

I think the issue actually is with using the base pipe as your second pipe. See the example below:

library(magrittr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df1 <- df2 <- df3 <- structure(list(Patient_ID = c(
  "1", "2", "3", "4", "5", "6", "7",
  "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
  "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29",
  "30", "31", "32"
), sequenced_case = c(
  "Yes", "Yes", "Yes", "Yes",
  "Yes", "Yes", NA, "Yes", NA, NA, "Yes", "Yes", "Yes", NA, "Yes",
  "Yes", "Yes", "Yes", "Yes", NA, NA, "Yes", "Yes", "Yes", "Yes",
  "Yes", NA, "Yes", "Yes", "Yes", "Yes", NA
)), row.names = c(
  NA,
  -32L
), class = "data.frame")


df1 <- df1 %>%
  group_by(sequenced_case) |>
  mutate(seq_id = row_number())

df2 %<>% 
  group_by(sequenced_case) |>
  mutate(seq_id = row_number())
#> # A tibble: 32 × 3
#> # Groups:   sequenced_case [2]
#>    Patient_ID sequenced_case seq_id
#>    <chr>      <chr>           <int>
#>  1 1          Yes                 1
#>  2 2          Yes                 2
#>  3 3          Yes                 3
#>  4 4          Yes                 4
#>  5 5          Yes                 5
#>  6 6          Yes                 6
#>  7 7          <NA>                1
#>  8 8          Yes                 7
#>  9 9          <NA>                2
#> 10 10         <NA>                3
#> # ℹ 22 more rows

df3 %<>% 
  group_by(sequenced_case) %>%
  mutate(seq_id = row_number())


df1
#> # A tibble: 32 × 3
#> # Groups:   sequenced_case [2]
#>    Patient_ID sequenced_case seq_id
#>    <chr>      <chr>           <int>
#>  1 1          Yes                 1
#>  2 2          Yes                 2
#>  3 3          Yes                 3
#>  4 4          Yes                 4
#>  5 5          Yes                 5
#>  6 6          Yes                 6
#>  7 7          <NA>                1
#>  8 8          Yes                 7
#>  9 9          <NA>                2
#> 10 10         <NA>                3
#> # ℹ 22 more rows
df2
#> # A tibble: 32 × 2
#> # Groups:   sequenced_case [2]
#>    Patient_ID sequenced_case
#>    <chr>      <chr>         
#>  1 1          Yes           
#>  2 2          Yes           
#>  3 3          Yes           
#>  4 4          Yes           
#>  5 5          Yes           
#>  6 6          Yes           
#>  7 7          <NA>          
#>  8 8          Yes           
#>  9 9          <NA>          
#> 10 10         <NA>          
#> # ℹ 22 more rows
df3
#> # A tibble: 32 × 3
#> # Groups:   sequenced_case [2]
#>    Patient_ID sequenced_case seq_id
#>    <chr>      <chr>           <int>
#>  1 1          Yes                 1
#>  2 2          Yes                 2
#>  3 3          Yes                 3
#>  4 4          Yes                 4
#>  5 5          Yes                 5
#>  6 6          Yes                 6
#>  7 7          <NA>                1
#>  8 8          Yes                 7
#>  9 9          <NA>                2
#> 10 10         <NA>                3
#> # ℹ 22 more rows

Created on 2023-07-23 with reprex v2.0.2

1 Like

Thank you for pointing that out.

Is this another bug/issue in R, which is conected with compatibility issues, in need of fixing, tweaking, installing, reinstalling, creating issues on github, etc. ?
I assume that if a new base pipe is introduced, it would work correctly. It turns out very so often, that reality is different, which makes it difficult for beginners to learn.

When I run your code I get three columns both times and they're identical.

> df <- structure(list(Patient_ID = c(
+   "1", "2", "3", "4", "5", "6", "7",
+   "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
+   "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29",
+   "30", "31", "32"
+ ), sequenced_case = c(
+   "Yes", "Yes", "Yes", "Yes",
+   "Yes", "Yes", NA, "Yes", NA, NA, "Yes", "Yes", "Yes", NA, "Yes",
+   "Yes", "Yes", "Yes", "Yes", NA, NA, "Yes", "Yes", "Yes", "Yes",
+   "Yes", NA, "Yes", "Yes", "Yes", "Yes", NA
+ )), row.names = c(
+   NA,
+   -32L
+ ), class = "data.frame")
> 
> 
> df <- df %>%
+   group_by(sequenced_case) |>
+   mutate(seq_id = row_number())
> 
> df
# A tibble: 32 × 3
# Groups:   sequenced_case [2]
   Patient_ID sequenced_case seq_id
   <chr>      <chr>           <int>
 1 1          Yes                 1
 2 2          Yes                 2
 3 3          Yes                 3
 4 4          Yes                 4
 5 5          Yes                 5
 6 6          Yes                 6
 7 7          NA                  1
 8 8          Yes                 7
 9 9          NA                  2
10 10         NA                  3
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows
> 
> df %<>% group_by(sequenced_case) |>
+   mutate(seq_id = row_number())
# A tibble: 32 × 3
# Groups:   sequenced_case [2]
   Patient_ID sequenced_case seq_id
   <chr>      <chr>           <int>
 1 1          Yes                 1
 2 2          Yes                 2
 3 3          Yes                 3
 4 4          Yes                 4
 5 5          Yes                 5
 6 6          Yes                 6
 7 7          NA                  1
 8 8          Yes                 7
 9 9          NA                  2
10 10         NA                  3
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows
> 
> df
# A tibble: 32 × 3
# Groups:   sequenced_case [2]
   Patient_ID sequenced_case seq_id
   <chr>      <chr>           <int>
 1 1          Yes                 1
 2 2          Yes                 2
 3 3          Yes                 3
 4 4          Yes                 4
 5 5          Yes                 5
 6 6          Yes                 6
 7 7          NA                  1
 8 8          Yes                 7
 9 9          NA                  2
10 10         NA                  3
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows

The OP did not run the code as a single chunk. If you use df that already has the new column as the input for the code with the %<>% pipe, it will still be there. Instead of recreating the original df, just save the first output as df1.

library(tidyverse)
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

df <- structure(list(Patient_ID = c(
  "1", "2", "3", "4", "5", "6", "7",
  "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18",
  "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29",
  "30", "31", "32"
), sequenced_case = c(
  "Yes", "Yes", "Yes", "Yes",
  "Yes", "Yes", NA, "Yes", NA, NA, "Yes", "Yes", "Yes", NA, "Yes",
  "Yes", "Yes", "Yes", "Yes", NA, NA, "Yes", "Yes", "Yes", "Yes",
  "Yes", NA, "Yes", "Yes", "Yes", "Yes", NA
)), row.names = c(
  NA,
  -32L
), class = "data.frame")

# do not alter df in this step 

df1 <- df %>%
    group_by(sequenced_case) |>
    mutate(seq_id = row_number())
df1
#> # A tibble: 32 × 3
#> # Groups:   sequenced_case [2]
#>    Patient_ID sequenced_case seq_id
#>    <chr>      <chr>           <int>
#>  1 1          Yes                 1
#>  2 2          Yes                 2
#>  3 3          Yes                 3
#>  4 4          Yes                 4
#>  5 5          Yes                 5
#>  6 6          Yes                 6
#>  7 7          <NA>                1
#>  8 8          Yes                 7
#>  9 9          <NA>                2
#> 10 10         <NA>                3
#> # ℹ 22 more rows

# use the original df as input

df %<>% 
  group_by(sequenced_case) |>
  mutate(seq_id = row_number())
#> # A tibble: 32 × 3
#> # Groups:   sequenced_case [2]
#>    Patient_ID sequenced_case seq_id
#>    <chr>      <chr>           <int>
#>  1 1          Yes                 1
#>  2 2          Yes                 2
#>  3 3          Yes                 3
#>  4 4          Yes                 4
#>  5 5          Yes                 5
#>  6 6          Yes                 6
#>  7 7          <NA>                1
#>  8 8          Yes                 7
#>  9 9          <NA>                2
#> 10 10         <NA>                3
#> # ℹ 22 more rows
df
#> # A tibble: 32 × 2
#> # Groups:   sequenced_case [2]
#>    Patient_ID sequenced_case
#>    <chr>      <chr>         
#>  1 1          Yes           
#>  2 2          Yes           
#>  3 3          Yes           
#>  4 4          Yes           
#>  5 5          Yes           
#>  6 6          Yes           
#>  7 7          <NA>          
#>  8 8          Yes           
#>  9 9          <NA>          
#> 10 10         <NA>          
#> # ℹ 22 more rows

Created on 2023-07-23 with reprex v2.0.2

1 Like

I agree with @StatSteph - the issue is with mixing the pipes.
I don't know why, but the 2nd expression is being interpreted as (see brackets)

(df %<>% group_by(sequenced_case)) |>
  mutate(seq_id = row_number())

ie just do the group_by and assign to df. Then the mutate is unassigned

I don't know how magrittr works out how to bind expressions but it looks like the "new" pipe (|>) is closing off the first.

I must admit I had never seen %<>% before and having mourned the lack of C-like plus equals operators I have gone along with df <- df %>% etc

What is weirdly interesting, this one works:

df <- df %<>% group_by(sequenced_case) |>
  mutate(seq_id = row_number())
1 Like

TL;DR: it's actually a pretty advanced topic to understand what's going on, and the solution is not satisfying. If you're a beginner, my best advice might be to just avoid mixing these pipes for now.

Understanding what's happening

What can be helpful is to use quote() to see how R parses the expression:

quote(df %<>% group_by(sequenced_case) |>
        mutate(seq_id = row_number()))
#> mutate(df %<>% group_by(sequenced_case), seq_id = row_number())


quote(df <- df %<>% group_by(sequenced_case) |>
        mutate(seq_id = row_number()))
#> df <- mutate(df %<>% group_by(sequenced_case), seq_id = row_number())

Created on 2023-07-24 with reprex v2.0.2

So, let's start with the first case. When you have X |> Y(), this get automatically replaced by Y(X) before anything is evaluated.

So:

df %<>% group_by(sequenced_case) |>
        mutate(seq_id = row_number())

is equivalent to:

{df %<>% group_by(sequenced_case)} |>
        mutate(seq_id = row_number())

or

X |>
       mutate(seq_id = row_number())

where X is df %<>% group_by(sequenced_case).

And indeed it gets replaced by:

mutate(X, seq_id = row_number())

The reason this works like that is that R replaces the pipe before it even tries to evaluate the expression. Indeed, you will note that in my reprex, I did not load {magrittr}, so the code itself can not run! Let's make it even more obvious, with a totally meaningless function name:

quote(
  a |> hfgfdsifd()
)
#> hfgfdsifd(a)
a |> hfgfdsifd()
#> Error in hfgfdsifd(a): could not find function "hfgfdsifd"

Created on 2023-07-24 with reprex v2.0.2

So, this is the difference between the base R pipe |> and the magrittr pipe %>%, while %>% is a function, |> is replaced before anything else happens.

Now to your second case:

df <- df %<>% group_by(sequenced_case) |>
        mutate(seq_id = row_number())

this can be rewritten:

X |> mutate(seq_id = row_number())

where X is df <- df %<>% group_by(sequenced_case), so the whole thing gets rewritten with:

mutate(X, seq_id = row_number())

Solutions

OK now we understand (I hope) why the native pipe behaves like that, but obviously that's not what you mean. So how do you make clear what you want?

You can tell R how to organize expressions using {}. So, a natural approach is to try:

df %<>% {
  group_by(sequenced_case) |>
    mutate(seq_id = row_number())
}

We can check its parsing by R:

quote(
  df %<>% {
    group_by(sequenced_case) |>
      mutate(seq_id = row_number())
  }
)
#> df %<>% {
#>     mutate(group_by(sequenced_case), seq_id = row_number())
#> }

So here the association is correct, the group_by() is indeed inside the mutate().

However, if you run this, it fails with error message object 'sequenced_case' not found. Indeed, from the point of view of %<>%, the right expression is the function{, so it will fail to pass df as the first argument to group_by(). One solution is then to pass the arguments along ourselves, for that we need a function:

df %<>% {\(.df)
  .df %>% group_by(sequenced_case) |>
    mutate(seq_id = row_number())
  }()

which does what you want.

Conclusions?

If this looks disappointing, I'm with you! But I don't think there is any easier solution (please let me know if you can think of one!)

It is at a very fundamental level the consequence of design decisions, that has been a reproach to the base R pipe when it came out, and I'm afraid the solution is to not mix the base pipe and the magrittr exotic pipes, or be ready for more complex code.

And yes, we could wish that the designers of the R language had avoided this type of confusing situations, but that's what happens with living languages: the magrittr pipes are relatively recent, the |> pipe even more, all of this is the result of (very clever) humans trying their best without knowing the future.

1 Like

Thank you so much for comprehensive explanation.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.