I have a number of business processes where I regularly cache calculations. Right now I typically run my process daily and because I don't cache that much data I just save a complete daily tagged snapshot of all the results. Now I have a situation where the data changes very slowly and there's quite a few rows of data. Since it's very slow moving, this feels like a slow changing dimension in an ETL process. When I have dealt with these in the past someone on the database team typically handled this using logic built into the database. Are there any R packages that would help with the extraction of slow moving data and snapshotting?
I took a peak at the CRAN Task View on Databases and didn't see anything applicable. Python has a few things like pragmaetl but it seems really complicated and possibly overkill.
I don't know of any R based solutions, but there's a lot to be said for db-side transaction logging for whenever you run a calculation based on a data snapshot and save it. See https://mariadb.com/kb/en/library/mariadb-audit-plugin-log-settings/ as one example. In my idealized metaverse, everything an ETL wrote to a db, it would include the code, as a blob if necessary what that made extract_{time_{i}} become load_{t_i}.
I doubt that a purely R based solution is feasible.
IMHO the cleanest solution of the SCD2 issue is doing UPSERT (insert ... on conflict update), which is a part of SQL that has so far resisted standardization. Any solution is bound to be very platform specific.
It is best left to the backend, and by extension database people.