Strategies to manage caller environments in nested checker functions

arangaca · February 19, 2025, 6:51pm

I've a package that uses many checker functions (that use rlang::abort() under the hood) to display the name of the caller function in error messages.

Although I try to abort as soon as possible, some of these checker functions are nested more or less deeply in various R6 methods. For example, public methods that rely on a base private method that uses another method to process some data that wraps another method to perform data type check, sometimes with intermediate helper functions using map() or walk().

Currently, I'm relying on the following function as the default argument in abort() to bubble up to the user environment and retrieve the right function call:

function() {
  caller_env(sys.parent())
}

That works in some cases, fails in others (in particular when used inside walk()/map() functions), but it's relatively unreliable in general.

I'd like to improve that aspect to ensure error messages always show the right call.

The obvious solution is to pass calling environments along every function but that's quite messy in my case due to the number of functions/methods to change and feel somewhat unnatural to add this type of parameter in some functions.

Another solution (that I don't like) would be to give a fix number to retrieve the calling environment from deeper in the frame stack so that I don't have to add a new parameter to lots of functions. This is very fragile though.

I could also check arguments in all public methods directly but that's also fragile when using a common helper in various places because it easy to forget some checks.

Are there better alternatives?

Scarletios · February 20, 2025, 6:01am

A more reliable approach would be to use rlang::caller_env() rather than manually specifying sys.parent() or a fixed frame. caller_env() is specifically designed to reliably capture the caller environment, even in nested or iterated function calls (like within map() or walk()). You can modify your abort() calls to use rlang::caller_env() which handles environment propagation automatically without needing to pass the environment explicitly. This approach should eliminate the need to pass the environment manually, ensuring that your error messages reflect the correct calling function in all scenarios.

arangaca · February 20, 2025, 7:04pm

I already use rlang::caller_env() as you can see in my previous post. caller_env() is just a wrapper around parent.frame() though. It doesn't bubble up to the user environments, at least not automatically as you can see in the following example:

foo <- function(call = rlang::caller_env()) {
  rlang::abort("Oops!", call = call)
}

bar <- function() {
  purrr::walk(seq_len(2), \(i) {
    foo()
  })
}

baz <- function(call = rlang::current_env()) {
  purrr::walk(seq_len(2), \(i) {
    foo(call)
  })
}

tryCatch(bar(), error = \(e) e$parent$call)
#> .f(.x[[i]], ...)
tryCatch(baz(), error = \(e) e$parent$call)
#> baz()

^{Created on 2025-02-20 with reprex v2.1.1}

When there're intermediate frames, you need to pass the environment along the different functions (or get the caller frame using a fix number).

Here's the backtrace of the most deeply nested check in my package:

aut <- plume::Plume$new(data.frame(
  given_name = "A",
  family_name = "B",
  orcid = "1"
))
aut$get_author_list("o")

Backtrace:
    ▆
 1. └─aut$get_author_list("o")
 2.   └─private$get_author_list_suffixes(suffix) at plume/R/plume.R:146:9
 3.     └─plume:::add_suffixes(out, vars, symbols) at plume/R/plume.R:301:7
 4.       ├─plume:::without_indexed_error(...) at plume/R/utils-tbl.R:33:3
 5.       │ └─base::withCallingHandlers(...) at plume/R/checkers.R:85:3
 6.       └─purrr::iwalk(...)
 7.         └─purrr::walk2(.x, vec_index(.x), .f, ...)
 8.           └─purrr::map2(.x, .y, .f, ..., .progress = .progress)
 9.             └─purrr:::map2_("list", .x, .y, .f, ..., .progress = .progress)
10.               ├─purrr:::with_indexed_errors(...)
11.               │ └─base::withCallingHandlers(...)
12.               ├─purrr:::call_with_cleanup(...)
13.               └─plume (local) .f(.x[[i]], .y[[i]], ...)
14.                 └─plume:::add_orcid_icons(data, key, value) at plume/R/utils-tbl.R:38:7
15.                   └─plume:::make_orcid_icon(data[[col]], attributes(orcid)) at plume/R/utils-tbl.R:58:3
16.                     └─plume:::make_orcid_uri(orcid) at plume/R/icon.R:120:3
17.                       └─plume:::check_orcid(x) at plume/R/icon.R:114:3
18.                         └─plume:::abort(...) at plume/R/checkers.R:288:3
19.                           └─rlang::abort(msg, call = call) at plume/R/checkers.R:98:3

The function checks the validity of ORCID IDs. I need to perform this check in 2 functions. For convenience I placed check_orcid() a bit deeper in the call stack so I only need to write it once. That could be improved but there would still be many calls above in which check_orcid() doesn't belong. Adding a call parameter in every function above specifically for check_orcid() doesn't feel natural either.

arangaca · February 24, 2025, 6:34pm

Another solution I was thinking of is to bind the caller environment to a hook in the function executed by the user and let checker functions retrieve that hook to print the right caller name. I didn't know how to do it but found some clues in the 7th section of the Advanced R book. I think this is also what they use in some dplyr's functions.

I've been experimenting a bit but I can't manage to make it work.

local_error_call <- function(frame = rlang::caller_env(), call = frame) {
  frame$error_call_ <- call
}

get_error_call <- function(call = rlang::caller_env()) {
  if (identical(call, globalenv())) {
    stop("Not found", call. = FALSE)
  } else if (is.null(call$error_call_)) {
    get_error_call(rlang::env_parent(call))
  } else {
    call$error_call_
  }
}

foo <- function() {
  local_error_call()
  bar()
}

bar <- function() {
  baz()
}

baz <- function() {
  call <- get_error_call()
  rlang::abort("Oops!", call = call)
}

foo()
#> Error: Not found

^{Created on 2025-02-24 with reprex v2.1.1}

Any guidance/help would be appreciated.

arangaca · February 28, 2025, 10:15pm

After reading a bit more about the differences between environments and frames, I came up with a solution that works:

local_error_call <- function(frame = rlang::caller_env(), call = frame) {
  frame$error_call_ <- call
}

get_error_call <- function(call = rlang::caller_env()) {
  frames <- as.list(sys.frames())
  for (frame in frames) {
    caller_call <- frame$error_call_
    if (!is.null(caller_call)) {
      call <- caller_call
      break
    }
  }
  call
}

foo <- function() {
  rlang::abort("Oops!", call = get_error_call())
}

bar <- function() {
  local_error_call()
  purrr::walk(seq_len(2), \(i) {
    foo()
  })
}

tryCatch(bar(), error = \(e) e$parent$call)
#> bar()

^{Created on 2025-02-28 with reprex v2.1.1}

system · March 7, 2025, 10:16pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.