Logistic Regression using glmnet(): accuracy measure from mean() returns 0

joels · June 19, 2018, 5:52am

Your example isn't reproducible, but it looks like your code is analogous to the example below. The outcome New_Product_Type has values of "1" or "0". But you're setting lasso_predict to have values of "pos" or "neg". Since the labels of the actual and predicted values never match, the number "correct" is always zero, even if the predictions are perfect (as they are in the example below).

# Actual outcomes 
New_Product_Type = c("1","0","0","1","1","0")

# Predicted outcomes
lasso_predict = c("pos","neg","neg","pos","pos","neg")

New_Product_Type == lasso_predict

[1] FALSE FALSE FALSE FALSE FALSE FALSE

mean(New_Product_Type == lasso_predict)

[1] 0

A couple of other things:

First, the following line in your code determines the predicted classes.

lasso_predict[lasso_prob>.5] <- "pos"

After creating a confusion matrix for the predictions, you then run:

lasso_predict[lasso_prob>.8] <- "pos"

This doesn't change lasso_predict, because predictions with probability greater than 0.5 were already all set to "pos". That's why both confusion matrices are the same. Reinitialize lasso_predict or create a new prediction vector to get a confusion matrix for the second case (or reverse the order of the code to set the lasso_predict values).

Using the confusion matrix, the accuracy is the sum of the diagonal divided by the sum of all four values (although accuracy isn't necessarily a particularly good measure of model performance; see, for example, here and here).

If might be easier to keep track of the various predictions by adding them as columns to the test data frame, rather than generating lots of stand-alone vector objects.

Second, instead of type.measure="mse", for a classification model "auc", "class", or "deviance" are better loss functions to use.