Below is an example of some code that will emit an unexpected many-to-many relationship warning when it is run
x <- data.frame(mycol1 = rep("blah", 4), id = c(6875L, 8978L, 8978L, 23L))
y <- data.frame(mycol1 = rep("blah", 4), id = c(29L, 8978L, 23L, 23L), value = c(0, 19.5, 100.9, 123))
z <- dplyr::inner_join(x, y, by = c("mycol1", "id"))
Warning message:
In dplyr::inner_join(x, y, by = c("mycol1", "id")) :
Detected an unexpected many-to-many relationship betweenxandy.
Row 4 of
xmatches multiple rows iny.
Row 2 of
ymatches multiple rows inx.
If a many-to-many relationship is expected, set
relationship = "many-to-many"to silence this
warning.
I can also wrap this in a function and if you then call that function then you also get the warning
myfunction <- function() {
x <- data.frame(mycol1 = rep("blah", 4), id = c(6875L, 8978L, 8978L, 23L))
y <- data.frame(mycol1 = rep("blah", 4), id = c(29L, 8978L, 23L, 23L), value = c(0, 19.5, 100.9, 123))
z <- dplyr::inner_join(x, y, by = c("mycol1", "id"))
return(z)
}
z <- myfunction()
However, if this function is then put inside a package and you call it directly from the package then the warning never ends up being displayed to the user.
It will get displayed if you run that function via e.g. devtools::test or similar during package development.
My understanding is that once I put the function inside a package, then the code runs under the package's controlled environment whereas sourced code runs in my global environment, and dplyr's warning system relies on these contexts to decide on when to warn about many-to-many joins. i.e. I should expect to get the warning when the function is sourced (treated as user-level code) but not when called as a namespaced package function (treated as internal). Meanwhile, devtools::test also ends up running it in a way that means the warnings get emitted. I also assume that this applies to all dplyr warnings and not just the many-to-many relationship one.
Is this understanding correct?
If so, my question is whether there is a way I can set up a package, or the environment in which a package is run, so that all dplyr warnings will still end up being displayed?
(The full context is that we have various packages that use dplyr to carry out data manipulation, but the data often comes from 3rd party sources and so we very much do want to see the warnings because it most likely highlights that there is a problem with that data that needs investigating and fixing).
Also, in case it helps, this is what the stack trace looks like when the warning gets thrown by devtools::test
1.├─mypkg::myfn(arg1, arg2, arg3) at test.myfn.R:9:1
2. │ ├─dplyr::inner_join(...) at mypkg/R/myfn.R:41:5
3. │ └─dplyr:::inner_join.data.frame(...)
4. │ └─dplyr:::join_mutate(...)
5. │ └─dplyr:::join_rows(...)
6. │ └─dplyr:::dplyr_locate_matches(...)
7. │ ├─base::withCallingHandlers(...)
8. │ └─vctrs::vec_locate_matches(...)
9. ├─vctrs:::warn_matches_relationship_many_to_many(...)
10. │ └─vctrs:::warn_matches_relationship(...)
11. │ └─vctrs:::warn_matches(...)
12. │ └─vctrs:::warn_vctrs(...)
13. │ └─rlang::warn(...)
14. │ └─base::warning(cnd)
15. │ └─base::withRestarts(...)
16. │ └─base (local) withOneRestart(expr, restarts[[1L]])
17. │ └─base (local) doWithOneRestart(return(expr), restart)
18. └─dplyr (local)<fn>(<vc______>)
19. └─dplyr:::rethrow_warning_join_relationship_many_to_many(cnd, error_call)
20. └─dplyr:::warn_join(...)
21. └─dplyr:::warn_dplyr(...)
Thanks in advance for any help!