how can I compute a confusion matrix when # of levels differ for two variables?

mmarion · April 7, 2023, 6:12pm

Example is given below. One thought is to remove D and E from actual data and the corresponding prediction from predicted data set. I would like to use ! notation. I A list of the row numbers of data to be removed would be helpful. Can someone help? Thanks.

# Load the caret package
library(caret)

# Create some example data
actual <- factor(c("A", "D", "C", "A", "E", "C"))
predicted <- factor(c("A", "B", "C", "B", "C", "A"))

# Create the confusion matrix
confusionMatrix(actual, predicted)

#print the confusion matrix
print(confusion_matrix)

Error in confusionMatrix.default(actual, predicted) :
the data cannot have more levels than the reference

# Create the confusion matrix
confusionMatrix(predicted,actual)

#print the confusion matrix
print(confusion_matrix)

Error in confusionMatrix.default(predicted, actual) :
The data contain levels not found in the data.

AlexisW · April 7, 2023, 7:51pm

The number of levels in the data is not directly relevant: the confusion matrix is purely asking whether it's the same value or a different one. For your data:

actual      A D C A E C
predicted   A B C B C A
matches     1 0 1 0 0 0

So for example the accuracy is the number of matches divided by the total number, here it's 2/6 = 0.33333...

However, {caret} will refuse to compute this matrix if the possible levels are not identical. If this makes sense for your data, you can simply specify all the possible levels beforehand, whether or not they do appear in the data or prediction:

possible_levels <- LETTERS[1:6]

actual <- factor(c("A", "D", "C", "A", "E", "C"),
                 levels = possible_levels)
predicted <- factor(c("A", "B", "C", "B", "C", "A"),
                    levels = possible_levels)

You'll note here that I added a level "F" which is not present in the data, but that could in principle exist. Thus you first need to decide based on your knowledge of the context what are the possible levels, not just let R guess.

You can find more about factors and their levels in r4ds.

system · April 14, 2023, 7:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.