I have a dataset like this:
db_data <- iris[1:100,]
row_train <- sample(nrow(db_data), nrow(db_data)*0.8)
db_train <- iris[row_train,]
db_test1 <- iris[-row_train,]
db_test2 <- iris[101:150,]
db_test <- rbind(db_test1, db_test2)
so, my dataset are composed in thise way:
> table(db_train$Species)
setosa versicolor virginica
40 40 0
> table(db_test$Species)
setosa versicolor virginica
10 10 100
So in the train dataset I have two classes, while in the test dataset I have three. I try this ML model in Rstudio:
model_knn<-train(Species ~ ., data = db_train, method = "knn",tuneGrid = data.frame(k = c(2:20)))
summary(model_knn)
#-------
#PREDICTION NEW RECORD
#-------
test_data <- db_test
db_test$predict <- predict(model_knn, newdata=test_data, interval='confidence')
confusionMatrix(data=factor(db_test$predict),reference=factor(db_test$Species))
but when I run it I have this error:
Error: One or more factor levels in the outcome has no data: 'virginica'
How can I solve it? There is another model that can I test?
Thanks