Working with Targets Package and Package Development

philip.abangan · January 3, 2023, 7:39pm

Greetings, I'm trying to figure out if the targets package is suitable to our needs. Currently, I have a fairly large R package with a vignette that takes thirty minutes to an hour to run for bigger analyses. And users are regularly rerunning these. Based on the targets package description, it seems ideal. I've run through the example and tried applying it, but somewhat stuck on how this will scale within a large R package.

Is there an example of using Targets Package supplied within another package to speed up a single function? How will this work with multiple functions?
Since I have many functions, and each of those functions will likely have some same object names. How will targets package handle this? It seems the target objects are all stored in the same folder. It's less ideal for us to have every target object to be uniquely named accross functions.
It seems to read the target objects you must execute tar_read(). If an object is read using that function and manipulated, do we have to write that object back for the targets pipeline to pick that up?

file_path <- "folder/folder/file"
data <- file_read(file_path)

Say we have target objects dependent on the data object above. The code above always gets run. Will the targets pipeline be able to recognize whether the data is the same or changed?

Thanks for all/any help on this; excited for the possible benefits of the targets package! I've tried sifting through the documentation, let me know if I missed something.

wlandau · January 5, 2023, 2:10pm

This use case is tricky. I think what you are looking for is memoization, i.e. caching for an individual function. This is conceptually different from the way targets works. In targets, you deal with the full end-to-end pipeline all together, with all the pieces that depend on each other. Each of the targets is a call to a function, and results are cached at the level of calls to functions rather than the functions themselves. And targets takes into account how all the computations in the pipeline are connected and interdependent, which is tough to integrate into the way people usually write vignettes for package documentation. (Although targets does have ways to integrate with literate programming in general.)

One approach could be to create a pipeline completely independently from the vignette, run it with the data store set to somewhere in tools::R_user_dir(), and then let the vignette grab bits of pre-computed output using tar_read(). Or, your package could support target factories to make it easier for users to write their own pipelines that user your methods.

wlandau · January 5, 2023, 2:57pm

Also, in package vignettes I generally try to reduce the computation time (fewer MCMC iterations, smaller data, etc.). The results don't always need to be exactly correct as long as you state it explicitly, sometimes you can use a vignette to just communicate how to use the package. If it still takes a long time to run, you can move the package documentation to an external website like a Quarto book. This is why the chapters of the targets user manual are in their own repository instead of the package vignettes.

maelle · January 5, 2023, 3:03pm

If relevant, here's a blog post about caching.

philip.abangan · January 5, 2023, 4:21pm

Thank you both @wlandau & @maelle!

system · January 12, 2023, 4:21pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.