I'm new to programming and R and I'm having trouble plotting the confusion matrix in relation to my data. I am trying to reproduce with another database the example of the book by Trevor Hastie and et. al. Chapter 4 Lab: Logistic Regression 4.6.2. I will not post the book link here because I don't know if the rules allow it, but it is available for free download on the page of one of the authors (Gareth James). The confusion matrix I get is a 1x2 vector instead of a 2x2 matrix. I am making a prediction with the logistic regression model to make a prediction of whether or not you will have a disease. The variable "cases" has the number of cases in a given period, so I created a new variable (Cases2) in which I put 0 where there were no cases of the disease and 1 where they had. Then I coded the variable for factor, and when it happened at least one case at the site is YES = 1 and otherwise NO = 0.
Data $ Cases2 <- factor (Data $ Cases2, label = c ("No", "Yes"), levels = c ("0", "1"))
Then I applied the logistic regression function and I converted these predicted probabilities into class labels, to Yes or No. The following two commands create a vector of class predictions based on whether the predicted probability of an increase in cases is greater than or less than 0.5.
Links, such as the book are welcome when helpful to illuminate a problem. If application of sound judgment does not screen out problematic links, the community will. Examples might include links to pirated copies, verbatim unpublished instructor assignments and links full of extraneous stuff. None of those problems appear here.
The following reprex uses the link code to illustrate part of the problem— why output2 does not look like output1, by clearly showing how output1 is constructed. What is unclear is the derivation of output2.
suppressPackageStartupMessages({
library(ISLR)
})
# Section 4.6.2 Gareth James et. al. An Introduction to Statistical Learning 1st ed.
attach(Smarket)
str(Smarket)
#> 'data.frame': 1250 obs. of 9 variables:
#> $ Year : num 2001 2001 2001 2001 2001 ...
#> $ Lag1 : num 0.381 0.959 1.032 -0.623 0.614 ...
#> $ Lag2 : num -0.192 0.381 0.959 1.032 -0.623 ...
#> $ Lag3 : num -2.624 -0.192 0.381 0.959 1.032 ...
#> $ Lag4 : num -1.055 -2.624 -0.192 0.381 0.959 ...
#> $ Lag5 : num 5.01 -1.055 -2.624 -0.192 0.381 ...
#> $ Volume : num 1.19 1.3 1.41 1.28 1.21 ...
#> $ Today : num 0.959 1.032 -0.623 0.614 0.213 ...
#> $ Direction: Factor w/ 2 levels "Down","Up": 2 2 1 2 2 2 1 2 2 2 ...
glm.fits <- glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = Smarket, family = binomial)
glm.probs <- predict(glm.fits, type = "response")
glm.pred = rep("Down", 1250)
glm.pred[glm.probs >.5] = "Up"
table(glm.pred, Direction)
#> Direction
#> glm.pred Down Up
#> Down 145 141
#> Up 457 507
detach(Smarket)
A further advantage to a reprex is that it relieves the burden of entering the code from the link and avoids frustrations such as the ambiguity between ~ and ∼ .