KNN model with two classes in the train dataset and three in classes in test

I have a dataset like this:

db_data <- iris[1:100,]

row_train <- sample(nrow(db_data), nrow(db_data)*0.8)
db_train <- iris[row_train,]
db_test1 <- iris[-row_train,]
db_test2 <- iris[101:150,]
db_test <- rbind(db_test1, db_test2)

so, my dataset are composed in thise way:

> table(db_train$Species)

    setosa versicolor  virginica 
        40         40          0 
> table(db_test$Species)

    setosa versicolor  virginica 
        10         10        100 

So in the train dataset I have two classes, while in the test dataset I have three. I try this ML model in Rstudio:

model_knn<-train(Species ~ ., data = db_train, method = "knn",tuneGrid = data.frame(k = c(2:20)))

test_data <- db_test
db_test$predict <- predict(model_knn, newdata=test_data, interval='confidence')

but when I run it I have this error:

Error: One or more factor levels in the outcome has no data: 'virginica'

How can I solve it? There is another model that can I test?


1 Like

There's really no way to deal with that unless you make the factor levels the same when you fit the model. Plus, the third class will never be predicted by the model since it is not in the training set.

Unfortunately in my real dataset I don't have the possibility to put all the classes in the train. Is it a good idea to use an unsupervised model?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.