For ETL's, when does it make sense to write functions, tests, packages, etc. and where to put them?

its.me.adam · August 8, 2023, 3:33pm

I have an ETL, where I've abstracted out 10 functions. The sole purpose of the functions is to improve the readability of the ETL code (apart from testing for robustness).

I'm curious if this is following best practice. With a package of functions, I now have to determine where to put this package so that the automated ETL can access it. The ETL is ideally living and working on Posit Connect, or on Github via Github Actions.

The package contains an internal data set, but I can make it a pin and throw it on Connect. This way I can then throw the package on CRAN and make it accessible for Connect/Docker/GA.

These functions are fairly short. I have a totally separate ETL using less functions (say 5). For this latter ETL, I figure it makes sense to simply define the functions and their tests at the top of the ETL and avoid the whole internal package-situation.

What do you think?

system · August 29, 2023, 3:33pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.