A directory that is an RStudio Project will contain an .Rproj file. Typically, if the directory is named “foo”, the Project file is foo.Rproj. And if that directory is also an R package, then the package name is usually also “foo”. The path of least resistance is to make all of these names coincide and to NOT nest your package inside a subdirectory inside the Project. If you settle on a different workflow, just know it may feel like you are fighting with the tool
Within our teams we are very often finding that we're working on small, very domain specific sets of functions, along with scripts, notebooks, reports, shiny-apps and plumber apis which are very closely tied to those functions.
In these situations, we've gotten into the habit of putting:
all those connected things in a single git repo (which ensures they get versioned together),
using a /package folder inside that repo for the functions (using DESCRIPTION, /R, /man inside there)
using top level /scripts, /reports, /apps and /apis folders for the consumers of the package functions
using renv on the entire repo
This seems to be working OK for us - especially as we've got build pipelines which auto-increment the package version number, deploy that to our local CRAN, and then auto-deploy the reports and apps using that latest checked-in package.
The only tool fighting we seem to be doing is that people do have to remember to use e.g. devtools::load_all("package") rather than just devtools::load_all()
But I'm wondering:
what "better" ways of structuring this are there?
what tools should we be fighting that we clearly are missing?
We do have one repo (out of many) with multiple packages, but generally we're keeping it 1:1
I guess we could have the reports, scripts, apps and apis has sub-folders of the package... but they all use the package, rather than being part of it - what's the logic/incentive in putting them inside the package structure/folder?
That you can call devtools::load_all(), which is a keypress in RStudio, and probably (well, hopefully) nothing else changes in your workflow.
Btw. you can also call load_all() from scripts within the package, and the scripts can be in subdirectories, because load_all() searches in the parent directory recursively.
I have this organization, and it works for me pretty well: containers/website at main · r-hub/containers · GitHub
(Although admittedly, this repo / package is pretty simple at this point, and there are no other subdirectories currently.)
I can declare dependencies in DESCRIPTION and I call load_all() from the quarto files.
But just to check - that's it - the only advantage is using devtools::load_all() rather than devtools::load_all("path_to_package")?
... and if we are using e.g. a reports folder rather than putting everything in the root, then we'll still need to use a path anyway (because knit will knit in the reports folder)?
Well, you said that was your only problem, so I tried to solve that:
I don't follow this, sorry. If you have an .Rmd in the reports/ folder, then you can call devtools::load_all() in the .Rmd without specifying the path. load_all() will by default look at the parent directory for DESCRIPTION (recursively) if it cannot find it in the current directory. Isn't this what you want?
Thanks - didn't know that the path parameter was Path to a package, or within a package. - that's useful to know
I think the overall thing I'm asking about is whether there's something we're missing... When I said "The only tool fighting we seem to be doing" what I meant was "we're not seeing any significant tool fighting"... so I was guessing (perhaps wrongly) that we might be missing tools we could/should be using?
I guess the original R packages text did only say "it may feel like"... so maybe we should just ignore it and carry on
To be clear, my advice coincides with the advice from the book: put your package at the top level.
It is not completely intuitive, but it is better to nest your scripts, Rmd documents, etc. inside the package than nesting the package inside your project.