source user-defined functions in a package

Seth127 · March 11, 2021, 7:59pm

Hello. This is kind of a weird pattern, so I won't be surprised if there's not an established approach, but I wanted to see if I could get some feedback from the folks who know. Also, if you can think of a better way to do this, I'm all ears.

The situation

I have a modeling package where users are able to define a function that does some data prep before the modeling starts and, if they have the script in a defined place and name the function make_input_data() then it will get sourced and called automatically when they submit their model to run. (see footnote for explanation, if you care)

Then the model object (I'm using S3 classes) has a build_data() dispatch that does something like this:

#' @param .mod the model S3 object
build_data.my_model_class <- function(.mod) {
  # make sure the function doesn't exist in a parent environment
  suppressSpecificWarning(rm(make_input_data), .regexpr = "object 'make_input_data' not found")

  # source and call function
  source(get_path_to_input_data_script(.mod), local = TRUE)
  input_data <- make_input_data(.mod)

  return(input_data)
}

This seems to work how I expect it to and it does the job.

The problem

The issue is that now devtools::check() (R CMD CHECK, I guess) gives me this note:

  build_data.my_model_class: no visible global function definition for
    ‘make_input_data’
  Undefined global functions or variables:
    make_input_data

I guess this is fine... it's just a note and I could ignore it. But I don't love it. Is there a way to get around this?

I had the idea to put something like make_input_data <- NULL or make_input_data <- "placeholder" in my aaa.R but that didn't work, I guess because R CMD CHECK still sees make_input_data(...) in the function and doesn't see a function definition anywhere.

Then I tried make_input_data <- function() {stop("this should never get called")}, thinking that my rm(...) at the top of the function would knock this out and then replace with my sourced function, but this broke my code because now I just see that error whenever I call build_data().

That was the most surprising thing... I guess you can't rm() a function that's part of a loaded package? And also, I guess the function that's loaded as part of the package overrides the function with the same name that I sourced? For some reason I thought it was the opposite, which is why I set that as an error in my placeholder (because it should never actually get called). I'm not sure what's going on there, but if anyone has a good idea, I would love to be educated.

Ok, so the primary question is how to get rid of the note, and then the secondary question is whether anyone understands the precedence thing happening with package functions vs source functions. Thanks for any help!

Footnote: None of this is relevant to the question, but just in cases anyone wonders "why are you doing any of this? why not just pass in a tibble or whatever?" The reason for these gymnastics is that the input data for these models might pull together several data sources and do some transformations on them and we want to have some provenance over whether the "same data" was used for various modeling runs. We do this by storing a hash of the make_input_data.R script and checking against that. This is not perfect, but it's a lot easier than trying to store hashes of all the various input files and/or data objects that get munged together before modeling. So in our case, we only care about whether the script has changed, and that serves as a proxy for whether any of the inputs are changed. Plus, if we really want to be sure, we can actually run the script again and verify the outputs, as opposed to trying to keep track of the inputs. Hope that's helpful.

Seth127 · March 11, 2021, 8:13pm

Well, a coworker just informed me that the simple solution is to use globalVariables("make_input_data") and that gets rid of the note. I guess I should've asked Slack first...

Anyway, if anyone has clarity on the secondary question, I'm still interested, but my need is much less urgent. Thanks!

nirgrahamuk · March 11, 2021, 8:17pm

I haven't tried to do this in a package building context, so its possible this would throw other notes/warnings/errors, but I had the thought that maybe instead of sourcing so as to get a function stood up, you could readlines,and your package can consume the text, and evaluate it into existence ?


myfunctext <- "myfunc<-function(x)x+1"

eval(str2expression(myfunctext))

myfunc(1)

Seth127 · March 11, 2021, 10:25pm

mmm, that's an interesting idea. I'd have to think on that. Not sure what the pros and cons are, but I don't think it changes the core concern here.

system · April 25, 2021, 8:00pm

This topic was automatically closed after 45 days. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.