Hello. This is kind of a weird pattern, so I won't be surprised if there's not an established approach, but I wanted to see if I could get some feedback from the folks who know. Also, if you can think of a better way to do this, I'm all ears.
The situation
I have a modeling package where users are able to define a function that does some data prep before the modeling starts and, if they have the script in a defined place and name the function make_input_data()
then it will get sourced and called automatically when they submit their model to run. (see footnote for explanation, if you care)
Then the model object (I'm using S3 classes) has a build_data()
dispatch that does something like this:
#' @param .mod the model S3 object
build_data.my_model_class <- function(.mod) {
# make sure the function doesn't exist in a parent environment
suppressSpecificWarning(rm(make_input_data), .regexpr = "object 'make_input_data' not found")
# source and call function
source(get_path_to_input_data_script(.mod), local = TRUE)
input_data <- make_input_data(.mod)
return(input_data)
}
This seems to work how I expect it to and it does the job.
The problem
The issue is that now devtools::check()
(R CMD CHECK, I guess) gives me this note:
build_data.my_model_class: no visible global function definition for
‘make_input_data’
Undefined global functions or variables:
make_input_data
I guess this is fine... it's just a note and I could ignore it. But I don't love it. Is there a way to get around this?
I had the idea to put something like make_input_data <- NULL
or make_input_data <- "placeholder"
in my aaa.R
but that didn't work, I guess because R CMD CHECK
still sees make_input_data(...)
in the function and doesn't see a function definition anywhere.
Then I tried make_input_data <- function() {stop("this should never get called")}
, thinking that my rm(...)
at the top of the function would knock this out and then replace with my sourced function, but this broke my code because now I just see that error whenever I call build_data()
.
That was the most surprising thing... I guess you can't rm()
a function that's part of a loaded package? And also, I guess the function that's loaded as part of the package overrides the function with the same name that I sourced? For some reason I thought it was the opposite, which is why I set that as an error in my placeholder (because it should never actually get called). I'm not sure what's going on there, but if anyone has a good idea, I would love to be educated.
Ok, so the primary question is how to get rid of the note, and then the secondary question is whether anyone understands the precedence thing happening with package functions vs source functions. Thanks for any help!
Footnote: None of this is relevant to the question, but just in cases anyone wonders "why are you doing any of this? why not just pass in a tibble or whatever?" The reason for these gymnastics is that the input data for these models might pull together several data sources and do some transformations on them and we want to have some provenance over whether the "same data" was used for various modeling runs. We do this by storing a hash of the make_input_data.R
script and checking against that. This is not perfect, but it's a lot easier than trying to store hashes of all the various input files and/or data objects that get munged together before modeling. So in our case, we only care about whether the script has changed, and that serves as a proxy for whether any of the inputs are changed. Plus, if we really want to be sure, we can actually run the script again and verify the outputs, as opposed to trying to keep track of the inputs. Hope that's helpful.