Hi there, I am working inside a research environment remote desktop and have built a cross-validated random forest model. I saved the workflow model as the vetiver model in a model board at the remote desktop and downloaded it from the remote desktop onto my own personal computer as a RDS file. However, I am unable to apply it in my PC to an external dataset. I would really appreciate any guidance on this.
Can you share a little more about the kind of code that is causing problems for you? If I were wanting to do this, I would start out by storing the model on the first machine with something like this:
library(tidymodels)
library(vetiver)
#>
#> Attaching package: 'vetiver'
#> The following object is masked from 'package:tune':
#>
#> load_pkgs
library(pins)
data(Sacramento)
rf_spec <- rand_forest(mode = "regression")
rf_form <- price ~ type + sqft + beds + baths
rf_fit <-
workflow(rf_form, rf_spec) %>%
fit(Sacramento)
v <- vetiver_model(rf_fit, "sacramento_rf")
model_board <- board_folder(path = "/tmp/test")
vetiver_pin_write(model_board, v)
#> Creating new version '20240925T201217Z-73f49'
#> Writing to pin 'sacramento_rf'
#>
#> Create a Model Card for your published model
#> • Model Cards provide a framework for transparent, responsible reporting
#> • Use the vetiver `.Rmd` template as a place to start
Created on 2024-09-25 with reprex v2.1.1
And then on the second machine, I would read the model and predict()
like this:
library(tidymodels)
library(vetiver)
#>
#> Attaching package: 'vetiver'
#> The following object is masked from 'package:tune':
#>
#> load_pkgs
library(pins)
data(Sacramento)
model_board <- board_folder(path = "/tmp/test")
v <- vetiver_pin_read(model_board, "sacramento_rf")
predict(v, Sacramento[5:10,])
#> # A tibble: 6 × 1
#> .pred
#> <dbl>
#> 1 106683.
#> 2 133271.
#> 3 147360.
#> 4 158743.
#> 5 128204.
#> 6 155486.
Created on 2024-09-25 with reprex v2.1.1
Are you trying to do something different?
Thanks so much for your prompt reply. So I can't use the same board folder to read the model. I downloaded it in my PC. I have downloaded the "v" from the remote desktop as a .RDS file. When I tried to use readRDS and then use it for prediction, the error message I am getting is attached.
Would really appreciate help with this. Thanks again.
Data splitting
set.seed(23102013)
dta_split <- initial_split(dta1, prop = 3/5)
dta_train = training(dta_split)
dta_test = testing(dta_split)
dta_fold = mc_cv(dta_train, times = 50)
nrow(dta_train)
nrow(dta_test)
Model building
tree_rec = recipe(No_Nadir~., data = dta_train)%>%
update_role(OS, All.Cause.Mortality, new_role = "id variable")%>%
step_center(AGE,ECOGBL,PSABL,HGBBL,BASEBMI)%>%
step_scale(AGE,ECOGBL,PSABL,HGBBL,BASEBMI)%>%
step_dummy(all_nominal_predictors(), one_hot = F)
tune_spec = rand_forest(
mtry = tune(),
trees = 1000,
min_n = tune())%>%
set_mode("classification")%>%
set_engine("ranger", importance = "impurity")
tune_wf = workflow()%>%
add_model(tune_spec)%>%
add_recipe(tree_rec)
Crossvalidation and tuning hyperparameters with grid search
library(parallel)
library(doParallel)
cl <- makePSOCKcluster(2)
registerDoParallel(cl)
set.seed(23102013)
tune_res = tune_grid(
tune_wf,
resamples = dta_fold,
grid = 1000,
control = control_resamples(save_pred = T)
)
tune_res%>%
collect_metrics()%>%
filter(.metric == "roc_auc")%>%
dplyr::select(mean, min_n, mtry)%>%
pivot_longer(min_n:mtry, values_to = "value", names_to = "parameters")%>%
ggplot(aes(value,mean,color=parameters)) +
geom_point(show.legend = F) +
facet_wrap(~parameters, scales = "free_x") +
labs(x = NULL, y = "AUC")
Selecting the best model
### Selecting best parameters
best_auc <- select_best(tune_res)
final_rf = finalize_model(tune_spec,
best_auc)
model_board = board_folder("/document/psa_response").
v <- veiver_pin_read(model_board, "final_rf").
###Downloaded v from the remote environment as a RDS file.
##Now in my Personal desktop
final_model = readRDS("v.RDS")
predict(final_model, dta_test3)
I would really appreciate it if someone could please help me understand where I was wrong.
Could you update your example to use the reprex package and perhaps a simple dataset that we both have access to? Using reprex makes it easier to see both the input and output, and for us to re-run the code in a local session. Thanks!