KNN model with two classes in the train dataset and three in classes in test

Inuraghe · November 16, 2021, 7:58am

I have a dataset like this:

db_data <- iris[1:100,]

row_train <- sample(nrow(db_data), nrow(db_data)*0.8)
db_train <- iris[row_train,]
db_test1 <- iris[-row_train,]
db_test2 <- iris[101:150,]
db_test <- rbind(db_test1, db_test2)

so, my dataset are composed in thise way:

> table(db_train$Species)

    setosa versicolor  virginica 
        40         40          0 
> table(db_test$Species)

    setosa versicolor  virginica 
        10         10        100

So in the train dataset I have two classes, while in the test dataset I have three. I try this ML model in Rstudio:

model_knn<-train(Species ~ ., data = db_train, method = "knn",tuneGrid = data.frame(k = c(2:20)))
summary(model_knn)
#-------

#PREDICTION NEW RECORD
#-------
test_data <- db_test
db_test$predict <- predict(model_knn, newdata=test_data, interval='confidence')
confusionMatrix(data=factor(db_test$predict),reference=factor(db_test$Species))

but when I run it I have this error:

Error: One or more factor levels in the outcome has no data: 'virginica'

How can I solve it? There is another model that can I test?

Thanks

Max · November 16, 2021, 4:14pm

There's really no way to deal with that unless you make the factor levels the same when you fit the model. Plus, the third class will never be predicted by the model since it is not in the training set.

Inuraghe · November 17, 2021, 8:04am

Unfortunately in my real dataset I don't have the possibility to put all the classes in the train. Is it a good idea to use an unsupervised model?

system · December 8, 2021, 8:04am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.