I'm experimenting with project structures for my current research. I used to use two distinct projects for the same research:
One project (A) for the actual analyses, manuscript, data, etc.
Another project (B) built as an R package for ad-hoc helper functions used in project A.
I've recently read about research compendiums, which seem to be a better alternative to keep everything related to the same research in one place.
I've a few questions regarding non-standard directory/file names and renv/DESCRIPTION's imports.
1:
My project has a scripts/ (for analyses) and data/ (for raw and processed data sets) directories. data/ contains .parquet files (locally) and a README.md (to keep the project structure on GitHub since I don't commit data to the GitHub repo).
This makes R CMD Check unhappy, either because of unexpected directory names (e.g. scripts) or unallowed file formats in data/.
Is there any recommended way to handle this? E.g., should I simply ignore CMD check warnings in this case? Or should I put the data sets in a custom directory, e.g. _data/, which will only generate a note as opposed to a warning? Or maybe put every non-standard directories/files below inst/?
2:
I use the DESCRIPTION Imports field to specify the required packages both for the analyses and helper functions. Some of these packages are only required to run the analyses or render the manuscript. Having packages not used in files below R/ also generates a R CMD Check note which I should probably just ignore.
But out of curiosity, is it preferable to use Imports only for packages helper functions depend on, and renv for the rest? Or keep everything in Imports and use renv on top of it?
Note that I'll most likely rely on devtools::load_all() rather than install and load the project as an actual package to use helper functions.
I would recommend adding any non-package file or directory to .Rbuildignore. This will exclude them from the R package tarball, and thus R CMD check will never see them. From Writing R Extensions:
To exclude files from being put into the package, one can specify a list of exclude patterns in file .Rbuildignore in the top-level source directory. These patterns should be Perl-like regular expressions (see the help for regexp in R for the precise details), one per line, to be matched case-insensitively against the file and directory names relative to the top-level package source directory.
If you are unfamiliar with the regular expression syntax required for this file, you can use the convenience function usethis::use_build_ignore(), which will automatically apply the correct syntax.
I would recommend putting all the packages required for the analysis but not for the R functions in Suggests. This should remove the R CMD check warnings but also make it convenient for you and others to install the required packages to run the analyses. For example, if you run pak::pak() inside a package, it will also install all the packages in Suggests.
Though to clarify, if you are using {renv}, I think you should use that for all your dependencies. The benefit of also including the packages in DESCRIPTION is 1) to document which functions are needed for the package functions (and to help R CMD check alert you about any real problems), and 2) to allow another user to install the required dependencies if they would prefer not to use {renv}.
That's great! I've never really dug into .Rbuildignore and how it actually works but that's super useful.
That's the part I'm struggling with the most. Currently I have tidyverse in Depends (that's a recommendation I read — can't remember where sadly — from the developers, or at least an okay thing to do, when using a research compendium). What I need for helpers and to reproduce the crucial part of the research, e.g. analyses/results is in Imports. I defined the rest in Suggests. I guess I'll have to experiment and see what works best for me.
I like what you suggest though and it's probably what makes the most sense so I'll definitely give it a try.