Best practices for writing functions that depend on external libraries (not in a package)?

venpopov · November 14, 2024, 7:23pm

When I write a package, it is easy to handle dependencies - if I want to use a function from another package, I either import the package/function or explicitely name it via ::.

I'm wondering what the best way to do that would be in a project that is not a package, but in which I have written a bunch of functions that then I use in .rmd or .qmd reports. Let's say I put my functions in a separate file functions.R, which I source at the begining of a particular analysis script. I want to use a bunch of functions from dplyr and purrr in my custom functions, but not name them explicitly via :: every time. As I see it I have two options: load the necessary packages in the "functions.R" script or load them in the analysis scripts. Both cases seem undesirable.

If I do the first (e.g. library(dplyr) in my functions.R script), then that's a side effect of sourcing functions.R, which I dislike. But if I don't, then the functions will only work in analysis scripts where I remember to import their dependencies. Fine for now, but if I return to this code in the future, there is no indication at all, in the absence of particular analysis scripts, what packages my custom functions depend on.

It seems like the only clean way to do this in a non-package project is to call every function explicitly via ::. Are there other, more efficient ways to do it?

AlexisW · November 15, 2024, 7:45pm

My suggestions: at the top of your script, list explicit calls to library() for all the needed packages. That way when you (or someone) opens a script, it's obvious what packages are required and you have little chance to forget loading it. If the list of packages at the top of your script gets too big, consider splitting your script in 2.

If a function requires a package, add a call requireNamespace() that fails with a useful message if the package is not loaded. This is actually also used when writing a package, if you have an optional dependency in Suggests.

A third way may be to create a (or several) custom packages, that you just store on Github or similar. Nowadays it's quite easy to write a small personal package, that you can keep private, and that has your commonly used functions (with their dependencies in DESCRIPTION).

Actually I would list this third option as the best practice: imagine you create my_helper_function(), use it in 1 or 2 projects. Then you start using it in a 3rd project, and realize there was a mistake. Would you reopen projects 1 and 2 to correct it? You might end up with many versions of the same function, all slightly out of date. Or, with a single version, and old projects won't work anymore because of the changes. Keeping these utilities functions in a package allow you to set an explicit version, to keep track of change with git, and even to have tests "for free".

And there is little downside to creating a package. Speaking for myself, I regularly use a package with just two functions of a few lines each, and I also made another package mostly to download files I regularly need and whose URL I keep forgetting. And I mostly can just use those without thinking about dependencies etc.

system · February 13, 2025, 7:45pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.