Run reticulate::py_require() once on package installation

rjake · June 12, 2025, 5:49pm

We have an internal python library that we want to build an R package/wrapper for. The python package only needs to be installed once and not every time the R package is loaded. The documentation says to use py_require() over virtualenv_create() but the former creates a temp environment and doesn't work if the user is offline or not behind the firewall. It also slows down dashboards, etc due to how it downloads the python pkgs each time the R package is loaded.

What it seems the docs suggest:

.onLoad <- function(libname, pkgname) {
  reticulate::py_require(
    list(
      "git+https://github.com/astral-sh/ruff",             # reprex
      # "git+https://github.our-enterprise-account/py/pkg, # actually need
      "pandas",
      "pyarrow"
    )
  )

We have chosen to use the virtualenv route and created a function to install the packages so that .onLoad() looks like this instead

.onLoad() {
  if (!reticulate::virtualenv_exists(virtual_env_path)) {
   return(message("First run 'install_our_pkg()'"))
  }
  use_virtualenv(virtualenv_name)
}

What is the best practice here?

t-kalinowski · June 12, 2025, 6:08pm

Both approaches seem fine to me, though the second approach won't work well if multiple R packages take this approach.

Just to clarify:

downloads the python pkgs each time the R package is loaded.

The python package should only be downloaded if a new version is available. If you're pointing py_require() at the head of a frequently updated repo, you might prefer to specify a specific git tag.

doesn't work if the user is offline or not behind the firewall

Users can set Sys.setenv(UV_OFFLINE=1) if temporarily offline (e.g., on a plane or train). In that case, uv skips checking for updated package versions and just uses the latest from the local cache.