I I'm trying to do cross validation for the logistics function.
I used smote on the training set and and left the test test obtained by initial split.
When I get to make the final estimate of the error, what argument do I put in last_fit? I cannot put the initial splits because otherwise consider the initial training set without smote. I attach the code below:
#initial split
pok_split=initial_split(pokemon,prop=3/4)
pok_train=training(pok_split)
pok_test=testing(pok_split)
pok_train_fold=vfold_cv(pok_train,v=5)
#train with smote
#train smote e fold
pok_smote_train=smote(be_legendary~.,pok_train,perc.over=2,perc.under=5.65) #2 e #5.65
frequenze_smote_train=count(pok_smote_train,be_legendary)
pok_test
pok_smote_fold=vfold_cv(pok_smote_train,v=5)
#logistic function with cross validation
logistica_recipe1=recipe(be_legendary~. ,data=pok_smote_train)
specifica_logistica1=logistic_reg()%>%set_engine("glm")
work_logistica1=workflow()%>%
add_model(specifica_logistica1)%>%
add_recipe(logistica_recipe1)
logistica_fit1=work_logistica1%>%fit_resamples(resamples=pok_smote_fold,control=control_resamples(save_pred = TRUE))
metriche_log_smote= collect_metrics(logistica_fit1,metric="accuracy")
si_logistica1= collect_predictions(logistica_fit1)
#########
migliore_metrica_log_smote= select_best(logistica_fit1,metric="accuracy")
work_finale_logistica1= work_logistica1%>%finalize_workflow(migliore_metrica_log_smote)
log_smote_fit_finale= work_finale_logistica1%>%last_fit(pok_split)
metriche_finali_log_smote= log_smote_fit_finale%>%collect_metrics()
predizioni_finali=collect_predictions(log_smote_fit_finale)
matrice_confusione_log_smote=table(predizioni_finali$.pred_class,pok_test$be_legendary)