Error: `data` and `reference` should be factors with the same levels

I have csv file 3 columns+ 1 column of label (true) I need to compare the I used kmeans to find a three-cluster solution. When I compare the cluster labels with the truth using a confusion matrix table I get an error Why?

#Insatll required packages:
install.packages("factoextra")
install.packages("caret")
install.packages("dplyr")
install.packages("mlbench")
install.packages("tidyr")


#Import required library:
library(factoextra)
library(caret)
library(dplyr)
library(mlbench)
library(tidyr)
library(e1071)

data_a <- read.csv(file = 'data_a.csv')

colnames(data_a)[1:4] <- c(' ',' ',' ','class')
head(data_a)
summary(data_a)

# Compute k-means with k = 3
dfa<-data_a[1:3]
set.seed(123)
kmean_model <- kmeans(dfa, 3, nstart = 25)
# Print the results
print(kmean_model)
fviz_cluster(kmean_model, data = data_a)

kmean_model_cluster <- as.data.frame(kmean_model$cluster)
names(kmean_model_cluster)[1] <- 'class'
head(kmean_model_cluster)
head(data_a)

kmean_model_cluster$class <- as.factor(ifelse(kmean_model_cluster=='1','2','3'))
kmean_model_cluster$class

confusionMatrix(data_a$class, kmean_model_cluster$class)

Also, the result of kmean_model_cluster$class not good

data_a$class is a character type, and it has 3 different values in it;
whereas
kmean_model_cluster$class is a factor type with 2 levels - you made it so : kmean_model_cluster$class <-as.factor(ifelse(kmean_model_cluster=='1','2','3')) -- as.factor

you also changed the 1 values to be 2, and all other values to be 3.
Did you have a reason to do that ?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.