I have an ML model I want to deploy using vetiver. The model uses variables from two data sources. One set of inputs should be provided on the /predict endpoint along with a GEOID variable. The second set of inputs are geographic variables I have independently collected.
What I would like to do is when the predict endpoint is called, use the provided GEOID to join the inputs from /predict with a local data object that contains all the values for the geographic variables. The joined set of values are what should then be passed to the predict() function. I'm having trouble conceptualizing how to implement this workflow since by default the API requires all inputs to be directly passed in the /predict endpoint. Has anyone implemented this type of workflow before?
I would approach this by writing a custom handler. Unfortunately we don't have great documentation on this yet, but I would point you to this issue for some initial examples and approaches.
Here is the general outline of what I would do:
library(tidymodels)
library(vetiver)
#>
#> Attaching package: 'vetiver'
#> The following object is masked from 'package:tune':
#>
#> load_pkgs
model_fit <- workflow(mpg ~ wt + cyl, linear_reg()) |> fit(mtcars)
## set the protoype to be the data you want passed into the API:
v <- vetiver_model(
model_fit,
"linear-mod",
save_prototype = tibble(wt = mtcars$wt)
)
library(plumber)
handle_join <- function(v, ...) {
v$model <- bundle::unbundle(v$model)
function(req) {
new_data <- req$body
new_data <- vetiver_type_convert(new_data, v$ptype)
## here is where you bind or join the input data to your supplementary data:
new_data <- cbind(new_data, cyl = 6)
## now predict:
predict(v$model, new_data = new_data, ...)
}
}
pr() |>
## still add the default vetiver handler so you get /ping, etc
## but you don't want to use the autogenerated /predict endpoint:
vetiver_api(v, path = "/dummy") |>
pr_post(path = "/predict", handler = handle_join(v))
#> # Plumber router with 5 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/dummy (POST)
#> ├──/logo
#> │ │ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/vetiver
#> ├──/metadata (GET)
#> ├──/ping (GET)
#> ├──/predict (POST)
#> └──/prototype (GET)
You would not use the autogenerated endpoint that is now at /dummy but instead would use your custom /predict endpoint. Let me know if any of that doesn't make sense!