Many of my team members are slowly making a switch from base R to tidyverse. To help their transition, I wrote a blog post that maps base R functions to equivalent tidyverse functions. I would really appreciate it if someone could review the list and point out any mistakes/corrections. Thanks!
One small thing in the intro paragraph, you swapped the 't' and 'i' in 'switch'...
Hopefully the table below helps you swtich from base R
A few mostly un-researched notes:
-
distinct
works on data frames/tibbles, not vectors. - I would personally put
do
along withsummarize
, notmap
, as I think of the former as "collapsing a data frame in a controlled manner" and the latter as "get an output for each input". - Depending on your target audience,
replicate
→rerun
may be helpful. - Since it doesn't even seem to be in the index for
purrr
yet, it's easy to misspluck
for extracting items from lists (and the use ofmap
to do it iteratively), but it's a common enough scenario that seems worth listing. -
aggregate
is a good general-casecumsum
-like function that could go in the "See Also" for that line.
This always comes up in discussion on SO. Is do
being deprecated in favour of map
@hadley? All we have is a saved tweet from you long ago.
Ah, I vaguely remember that discussion, but I never really updated my coding style to account for it (and then had to deal with a problem that multidplyr was great for parallel processing, even further cementing do
in my mind). I'll have to remember to try that next time I'm dealing with that kind of problem.
I think list-columns + map()
is easier to use and reason about than rowwise()
+ do()
.
There's a bigger learning curve (particularly since we don't have a great guide to all the ideas in one place), but I think the ideas generalise more readily to other domains.
I love reading all these, because I always learn something new. I am tidyverse
only and have been for about 9 months, but I had no idea if_else
existed until I read this. Will switch!
I couldn't agree more - list-columns has changed the way I execute models against grouped data. From my experiences, it's cut code bloat and run time down tremendously.
Karl Broman wrote a nice piece in that vein, though it could use a little updating:
hipsteR: re-educating people who learned R before it was cool
Thanks, @rajkorde! This is really helpful. I have been trying to use the tidyverse whenever I can but it's hard to break old habits when they're all you know. It's useful to see so many side-by-side comparisons. I would love to be "tidyverse only" like @rkahne. Maybe this will get me there.
Thank you all so much for your comments and suggestions. I have updated the post with all the recommended fixes.. Thanks again!
This is great! What would you think of adding an extra column identifying the package of the tidyverse function, maybe with links to their website where applicable? I'd be happy to help contribute to that.
Thanks @rajkorde, this is such a great resource!
I am in the process of switching from plyr/reshape2 to dplyr/tidyr/purrr, I was wondering if anyone knows of any such table of equivalence between plyr and dplyr?
Not a table, but http://jimhester.github.io/plyrToDplyr/ has parallel code using plyr/reshape and dplyr/reshape2 going through the original plyr examples. It might still be useful to see equivalents of common operations.
At the time I made the page tidyr/purrr did not exist, which is why they don't appear
I also have side by side code, base vs dplyr, for a set of data aggregation operations here:
https://jennybc.github.io/purrr-tutorial/bk01_base-functions.html
I don't include plyr, but address it in comments. Leaving plyr behind was really really hard for me , but I've finally done it.
@jimhester @jennybryan Thank you both for these links!
Like you @jennybryan plyr has been my main coding framework in R for many, many years... I guess what makes it hard for me to switch is that I do a lot of list <--> data.frame operations (ldply being a favourite), so have to learn not only dplyr but also purrr.
Those long time habits are hard to lose!
Loving this comparison, @rajkorde
I've started using list columns and broom
to organise my GLMs (where previously I had a named list and was using a separate data frame for the metadata, with the names as keys), and it makes them a lot easier to manage. The syntax with map
is a little bit trickier than (relatively) vanilla dplyr verbs, but the benefits for keeping models and their metadata tidy are amazing.
Worth mentioning most (all?) of tidyverse functions would expect a dataframe as an input.
And on base side I would add followings:
-
ifelse(is.na(…), …)
addcomplete.cases()
too. -
ifelse(…, NA)
add or replace withmtcars[ mtcars$cyl == 4, ] <- NA