Best format for reusable child scripts

ottie · November 27, 2024, 1:11pm

We have a data analytics pipeline, and we're struggling with it being almost but not quite standardised.
We go though various ingestion / cleaning / analytics steps which are often quite similar across different data analysis projects, but there's enough project-specific variation that we can't just functionalise the lot.
Each step is usually a combination of markdown and one or more code chunks.

some steps are pretty standardised in format (e.g. given our standard data input, plot and tabulate a certain statistic) but they will only be needed in certain projects
some steps are similar but will need some manual code editing depending on the project - exactly which bit needs editing is variable enough that it's difficult to parametrise
some steps are completely project-specific
some steps were developed specifically for one project and later we found we could recycle them for a different one

At the moment, we're doing a lot of copy-pasting (terrible for VC), rewriting the same code (waste of time), working of a "default template" which doesn't capture most of the chunks/modules we've developed over time, and overall being frustrated and unhappy.

We could work toward more functionalisation - however this makes edits tricky, and (personal opinion) I think hiding a lot of processing steps behind a function call can make the pipeline code quite hard to interpret.
We have explored Rmarkdown child documents / Quarto includes - is this the only way? They always feel like a minor functional feature and I'm a bit worried about hanging our entire pipeline structure on them.

What I think I'm envisioning is a "packaging system" for Rmd child documents, with a formalised documentation structure, and the ability to use the step document as-is or with modification. Is there any (non-absurd) way to repurpose an R package to work this way? Are we better off with a child document library repo that we fork for every project? Is there another way to accomplish this that I'm not thinking of?

system · February 25, 2025, 1:11pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.