I tried to shrink your code, including data preparation in the folder /data/
, and utility functions in the folder /R/
. The file /analyses.R
reproduce your analyses (clearly, data are not stored, and functions don't work ).
You can find everything here.
IMO "analyses.R
" files should be used to produce analyses only. That makes the code much easier to be understood by humans (including me or collaborators, maybe in the future). I have found myself spending much more time to (re-re-re-)reading my code (to understand, correct, explain, ... it) than the time I spend writing it. I.e., I scroll much more than I type! Hence, I try to put all my effort to spend more time writing code that allows me to spend less time to read and understand it in the future!
Moreover, splitting everything that "do something," the code gain many bonuses:
- you can replicate function calls without repeating all their code (e.g., using the
purrr
package) - you can test the exact functionality of each function (e.g., using the
testthat
package) - if you need to change something you have to change it in one place only (avoiding many checks and possible sources of bugs)
As the code and the analyses become more sophisticated (or repeated!), I think a package is a better environment for the work. Usually, I directly create a package for every analyses or project (if it is not a very little one both in space and time). Once you gain some practice, set up a package takes you no more than a couple of hours (often much less: usethis::create_package()
create the basic skeleton in few seconds...). With a package, many tools for automating tests are provided. With a package, all the code is "packaged," so it is easier to share the whole project or to ask someone else to contribute. There are also fewer constraints for names of internal functions:
- they cannot conflict with anything externally (if not exported)
- external functions cannot conflict with the internal ones (function not internal nor in
base
must be called including the original package, e.g.,dplyr::select()
).
I can suggest How to write readable code by Dustin Boswell. It was incredible for me!