Confusion matrix with wrong dimensions

I'm new to programming and R and I'm having trouble plotting the confusion matrix in relation to my data. I am trying to reproduce with another database the example of the book by Trevor Hastie and et. al. Chapter 4 Lab: Logistic Regression 4.6.2. I will not post the book link here because I don't know if the rules allow it, but it is available for free download on the page of one of the authors (Gareth James). The confusion matrix I get is a 1x2 vector instead of a 2x2 matrix. I am making a prediction with the logistic regression model to make a prediction of whether or not you will have a disease. The variable "cases" has the number of cases in a given period, so I created a new variable (Cases2) in which I put 0 where there were no cases of the disease and 1 where they had. Then I coded the variable for factor, and when it happened at least one case at the site is YES = 1 and otherwise NO = 0.

Data $ Cases2 <- factor (Data $ Cases2, label = c ("No", "Yes"), levels = c ("0", "1"))

Then I applied the logistic regression function and I converted these predicted probabilities into class labels, to Yes or No. The following two commands create a vector of class predictions based on whether the predicted probability of an increase in cases is greater than or less than 0.5.

glm.fit = glm (Cases2 ~ Precip + TempMa + TempMi + Humid, data = Data, family = binomial)
glm.probs = predict (glm.fit, type = "response")
contrasts (Data $ Case2)
glm.pred = rep ("No", 1250)
glm.pred [glm.probs> 0.5] = "Yes"
table (glm.pred, Data $ Case2)

And I get as a result

glm.pred No Yes
      No. 979 271

I did exactly as it is in the book and mine is giving this problem, someone can help!
What am I doing wrong? Any help I appreciate!

Links, such as the book are welcome when helpful to illuminate a problem. If application of sound judgment does not screen out problematic links, the community will. Examples might include links to pirated copies, verbatim unpublished instructor assignments and links full of extraneous stuff. None of those problems appear here.

While links may be helpful, a minimal reproducible example is essential to all but questions of a general nature. See the FAQ: How to do a minimal reproducible example reprex for beginners.

The following reprex uses the link code to illustrate part of the problem— why output2 does not look like output1, by clearly showing how output1 is constructed. What is unclear is the derivation of output2.

suppressPackageStartupMessages({
  library(ISLR)
})

# Section 4.6.2 Gareth James et. al. An Introduction to Statistical Learning 1st ed.

attach(Smarket)
str(Smarket)
#> 'data.frame':    1250 obs. of  9 variables:
#>  $ Year     : num  2001 2001 2001 2001 2001 ...
#>  $ Lag1     : num  0.381 0.959 1.032 -0.623 0.614 ...
#>  $ Lag2     : num  -0.192 0.381 0.959 1.032 -0.623 ...
#>  $ Lag3     : num  -2.624 -0.192 0.381 0.959 1.032 ...
#>  $ Lag4     : num  -1.055 -2.624 -0.192 0.381 0.959 ...
#>  $ Lag5     : num  5.01 -1.055 -2.624 -0.192 0.381 ...
#>  $ Volume   : num  1.19 1.3 1.41 1.28 1.21 ...
#>  $ Today    : num  0.959 1.032 -0.623 0.614 0.213 ...
#>  $ Direction: Factor w/ 2 levels "Down","Up": 2 2 1 2 2 2 1 2 2 2 ...
glm.fits <- glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = Smarket, family = binomial)
glm.probs <- predict(glm.fits, type = "response")
glm.pred = rep("Down", 1250)
glm.pred[glm.probs >.5] = "Up"
table(glm.pred, Direction)
#>         Direction
#> glm.pred Down  Up
#>     Down  145 141
#>     Up    457 507
detach(Smarket)

A further advantage to a reprex is that it relieves the burden of entering the code from the link and avoids frustrations such as the ambiguity between ~ and ∼ .

I suggest using caret::confusionMatrix() or yardstick::conf_mat().

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.