I was looking for proper usage of pipe operators and found the R Style Guide book by Hadley Wickham. When reading the section on pipe operators, I noticed this in 4.6:
The magrittr package provides the %<>% operator as a shortcut for modifying an object in place. Avoid this operator.
I was confused since I saw some examples with %<>% operator, and wondering if there are particular reasons for not using this operator.
Modifying in place is useful if and only if you absolutely mean to do it. It also means that if you run the same code twice, you'll (often) get different results, since you've overwritten the original data or variable. So, basically, it's a powerful operator that you can use, but it's one that we avoid for the most part in the tidyverse.
Style guides are really general patterns which, of course, have appropriate exceptions.
I just wanted to hop on and say thanks for this question I love it! So many questions on here are concrete, technical questions, and while I know that's appropriate, I personally enjoy the freedom we have here to ask opinion, discussion-based questions.
I agree that it's pretty limited, e.g. the %<>% operator won't be appropriate as in the following pseudo code:
# the original dataset just disappears with the summary
df %<>%
group_by("some grouping variables") %>%
summarize("some summaries")
# for summaries it's better to have a new dataset, e.g.
df_summary <-
df %>%
group_by("some grouping variables") %>%
summarize("some summaries")
So my conclusion is I'm going to use this operator only when cleaning/wrangling a dataset.
Thanks again for your great answer and if any comments please let me know😊
My opinion/usage:
There is absolutely nothing wrong with using %<>% if you know what you are doing, i.e. you are aware of the pitfalls mentioned above.
Agreed, I am a big fan of it because when you clean data the point is you DO want to override the data. You don't make a new variable or dataset for every stage. Since R is reproducible, if you made a mistake you can always rerun up until the mistake and change it. No biggie. I was looking for a long time how to put the compound pipe in addins for a long time, AddinexamplesWV has it, and after you add it you can create a keyboard shortcut, just like for %>%
So the %<>% operator is just a shortcut for
df = df %>%
do something?
So it takes a dataset, runs some stuff and replaces it? Is this correct?
The clearly it can be confusing, but it also prevents the need of finding new names or (or adding 1,2,3) in a longer pipeline of processing steps. As long everything is run from the start...
I'm not sure how it can be confusing: it does precisely what it states. It cannot even be easily used by accident because magrittr needs to be loaded first and there is no inbuilt keyboard shortcut.
The original question was about a style guide, which is inevitably a subjective judgement as to what makes sense and is user-friendly.
I will also point out that using = to assign objects instead of <- IS much more discouraged since it is best to leave = for arguments. It could be thought of as a style thing but has become a more stronger convention.
Whilst I prefer <- over = this is not a convention. It's just a personal preference which most R users appear to share but is very far from universal. Even one of R's original authors advocates = and strongly dislikes <-.
The reason I raise this is that I have observed new users being told not to use = which I think is a wrong message to give them (particularly when there are so many more important issues).
Wow, that's an interesting topic too because I'm one of those people who have been told not to use = but <-. I wonder why the recommendation has been changed for this assignment operator?
Yes I think there is a generational change aspect to it, so it may be a convention among "newer" users (6 year user here, not sure how "new" that makes me but certainly not original author "old"). I don't see what's wrong with discouraging = for object assignment though, I think it helps beginners understand how objects are different from arguments, which I think is the point of why it is discouraged.
To help beginners understand and conceptually demarcate the difference between setting an argument, assigning an object, and testing equality. See Hadley's two responses