How I can create a package like dplyr?

phamdinhkhanh · May 2, 2018, 3:22pm

Hi everybody,
I come from VietNam and now Rcommunity fastly growing up in here. But i think we just stop in apply package and our weakness is lack of programing R skill. So we can't develope specific package for our country such as clone data from many VietNam API source (security price, film,...), adjust visualization such as ggplot2 to be adequate with our country,.... Recently, i developed my own package names VNDS: https://github.com/phamdinhkhanh/VNDS to serve people who are in VietName financial sector. I want to develope my package more clean code and implement methology from tidyverse enviroment but i don't know how to process it. So i think the first thing i should do is learning the way owner create tidyverse. But i was empty. Should you recommend to me what should i learn or read to discover the tidyverse (about coding inside, i already been firmly about practice this methods). Thanks beforehands!

mara · May 2, 2018, 3:33pm

I think Hadley's Advanced R, and R Packages books (both freely available online at the links below) would be good places to start:

https://adv-r.hadley.nz/

You can also, of course, look at the source code itself on GitHub to get an idea about some of the internals.

olyerickson · May 2, 2018, 4:11pm

Mara's recommendation of those two key Hadley books is excellent.

You should approach package design from the perspective of a user. Think of the best (most useful, etc) package's you've used; what makes them great? Try to emulate those best practices.

Good luck!

nwerth · May 3, 2018, 8:03pm

This is great advice, and more easily ignored than one might expect. Any task worth writing a package for requires some complex coding, but that should be on you, not the user. Your users expect simplicity like

result <- really_complex_task(my_data)

They're unlikely to bother with anything more complex.

Also, it may be fun to mess around with tidy eval, but in most cases:

The column names of datasets are already known. This is especially true in subject-specific packages, like for an API. You could even make your own subclasses of data.frame to make sure you know the names.
Functions taking a data.frame and column names can usually be rewritten to just take vectors. This also makes them more flexible.
If you absolutely need flexibility in working with datasets, I suggest the seplyr package. I find it much easier to program with than rlang's non-standard evaluation model.