Deploying ML model with predictors provided via API and locally

bensoltoff · July 3, 2024, 3:49pm

I have an ML model I want to deploy using vetiver. The model uses variables from two data sources. One set of inputs should be provided on the /predict endpoint along with a GEOID variable. The second set of inputs are geographic variables I have independently collected.

What I would like to do is when the predict endpoint is called, use the provided GEOID to join the inputs from /predict with a local data object that contains all the values for the geographic variables. The joined set of values are what should then be passed to the predict() function. I'm having trouble conceptualizing how to implement this workflow since by default the API requires all inputs to be directly passed in the /predict endpoint. Has anyone implemented this type of workflow before?

meztez · July 3, 2024, 7:59pm

Load the data object in the plumber API execution environment?

library(plumber)
a <- new.env(parent = .GlobalEnv)
assign("data", mtcars, envir = a)
pr(envir = a) |> ....

julia · July 4, 2024, 1:54am

I would approach this by writing a custom handler. Unfortunately we don't have great documentation on this yet, but I would point you to this issue for some initial examples and approaches.

Here is the general outline of what I would do:

library(tidymodels)
library(vetiver)
#> 
#> Attaching package: 'vetiver'
#> The following object is masked from 'package:tune':
#> 
#>     load_pkgs


model_fit <- workflow(mpg ~ wt + cyl, linear_reg()) |> fit(mtcars)

## set the protoype to be the data you want passed into the API:
v <- vetiver_model(
  model_fit, 
  "linear-mod", 
  save_prototype = tibble(wt = mtcars$wt)
)

library(plumber)

handle_join <- function(v, ...) {
  v$model <- bundle::unbundle(v$model)
  function(req) {
    new_data <- req$body
    new_data <- vetiver_type_convert(new_data, v$ptype)

    ## here is where you bind or join the input data to your supplementary data:
    new_data <- cbind(new_data, cyl = 6)
    
    ## now predict:
    predict(v$model, new_data = new_data, ...)
  }
}

pr() |> 
  ## still add the default vetiver handler so you get /ping, etc
  ## but you don't want to use the autogenerated /predict endpoint:
  vetiver_api(v, path = "/dummy") |> 
  pr_post(path = "/predict", handler = handle_join(v))
#> # Plumber router with 5 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/dummy (POST)
#> ├──/logo
#> │  │ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/vetiver
#> ├──/metadata (GET)
#> ├──/ping (GET)
#> ├──/predict (POST)
#> └──/prototype (GET)

^{Created on 2024-07-03 with reprex v2.1.0}

You would not use the autogenerated endpoint that is now at /dummy but instead would use your custom /predict endpoint. Let me know if any of that doesn't make sense!

bensoltoff · July 5, 2024, 6:01pm

Thanks! This looks exactly like what I need.

system · July 12, 2024, 6:02pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.