Emitting warnings from dplyr code called by another package

Below is an example of some code that will emit an unexpected many-to-many relationship warning when it is run

x <- data.frame(mycol1 = rep("blah", 4), id = c(6875L, 8978L, 8978L, 23L))
y <- data.frame(mycol1 = rep("blah", 4), id = c(29L, 8978L, 23L, 23L), value = c(0, 19.5, 100.9, 123))
z <- dplyr::inner_join(x, y, by = c("mycol1", "id"))

Warning message:
In dplyr::inner_join(x, y, by = c("mycol1", "id")) :
Detected an unexpected many-to-many relationship between x and y.
:information_source: Row 4 of x matches multiple rows in y.
:information_source: Row 2 of y matches multiple rows in x.
:information_source: If a many-to-many relationship is expected, set relationship = "many-to-many" to silence this
warning.

I can also wrap this in a function and if you then call that function then you also get the warning

myfunction <- function() {
    x <- data.frame(mycol1 = rep("blah", 4), id = c(6875L, 8978L, 8978L, 23L))
    y <- data.frame(mycol1 = rep("blah", 4), id = c(29L, 8978L, 23L, 23L), value = c(0, 19.5, 100.9, 123))
    z <- dplyr::inner_join(x, y, by = c("mycol1", "id"))
    return(z)
}
z  <-  myfunction()

However, if this function is then put inside a package and you call it directly from the package then the warning never ends up being displayed to the user.

It will get displayed if you run that function via e.g. devtools::test or similar during package development.

My understanding is that once I put the function inside a package, then the code runs under the package's controlled environment whereas sourced code runs in my global environment, and dplyr's warning system relies on these contexts to decide on when to warn about many-to-many joins. i.e. I should expect to get the warning when the function is sourced (treated as user-level code) but not when called as a namespaced package function (treated as internal). Meanwhile, devtools::test also ends up running it in a way that means the warnings get emitted. I also assume that this applies to all dplyr warnings and not just the many-to-many relationship one.

Is this understanding correct?

If so, my question is whether there is a way I can set up a package, or the environment in which a package is run, so that all dplyr warnings will still end up being displayed?

(The full context is that we have various packages that use dplyr to carry out data manipulation, but the data often comes from 3rd party sources and so we very much do want to see the warnings because it most likely highlights that there is a problem with that data that needs investigating and fixing).

Also, in case it helps, this is what the stack trace looks like when the warning gets thrown by devtools::test

1.├─mypkg::myfn(arg1, arg2, arg3) at test.myfn.R:9:1
2. │ ├─dplyr::inner_join(...) at mypkg/R/myfn.R:41:5
3. │ └─dplyr:::inner_join.data.frame(...)
4. │ └─dplyr:::join_mutate(...)
5. │ └─dplyr:::join_rows(...)
6. │ └─dplyr:::dplyr_locate_matches(...)
7. │ ├─base::withCallingHandlers(...)
8. │ └─vctrs::vec_locate_matches(...)
9. ├─vctrs:::warn_matches_relationship_many_to_many(...)
10. │ └─vctrs:::warn_matches_relationship(...)
11. │ └─vctrs:::warn_matches(...)
12. │ └─vctrs:::warn_vctrs(...)
13. │ └─rlang::warn(...)
14. │ └─base::warning(cnd)
15. │ └─base::withRestarts(...)
16. │ └─base (local) withOneRestart(expr, restarts[[1L]])
17. │ └─base (local) doWithOneRestart(return(expr), restart)
18. └─dplyr (local) <fn>(<vc______>)
19. └─dplyr:::rethrow_warning_join_relationship_many_to_many(cnd, error_call)
20. └─dplyr:::warn_join(...)
21. └─dplyr:::warn_dplyr(...)

Thanks in advance for any help!

Your understanding is correct at least for the dplyr many-to-many warning. It doesn't trigger the warning if calling it from a package. The documentation in join-rows.R says:
# Indirect calls don't warn, because the caller is unlikely to have access
# to relationship to silence it.

I'm not positive if you can expose the environment argument that the function is using when being called, but if you look at the source code for join_rows() in join-rows.R, you can see it's using an argument user_env=caller_env().

I know this doesn't solve your problem completely but might at least give you some insight into what to try next.

Good luck.

Thank you very much for this - thanks to you I now understand what's going on and after reading the source code in "join-rows.R" and searching for is_direct in issues on the repo found this old issue where someone else was asking about the same thing (I hadn't come across this before).

That issue has been closed and in the end they decided not to mention the hack with Sys.setenv(TESTTHAT_PKG = "mypkg") in the vignette and left it there.

If I understand the code correctly, this behaviour only affects the "many-to-many" warning. In theory, I guess one could change the behaviour of is_direct

is_direct <- function(env) {
  env_inherits_global(env) || from_testthat(env)
}

so that there is an additional OR that references a global environment variable that a user can set so as to force these warnings to be surfaced regardless, but I expect there are good reasons not to.

I will mark this as solved - thanks again!