Positive class in logistic regression

In the germancredit dataset, the target variable creditability has values 1 = bad and 2 = good. The goal is to predict the bad. Wouldn't it make sense for logistic regression to set good=0 and bad=1? I don't think I see writers doing this. Thank you.

library(scorecard)
data("germancredit")
df <- data.frame(germancredit)
table(df$creditability)
str(df$creditability)

bad good
300 700
Factor w/ 2 levels "bad","good": 2 1 2 2 1 2 2 2 2 1 ...

when it comes to outcomes; 'good' and 'bad' are more human interpretable when compared to '0' and '1'

Thanks nirgahamuk. But for logisitic regression, don't I want to predict default with Pr(Y=1 | x)? So shouldn't I recode creditability as good=0 and bad=1?

And am I most interested in low False Positives, from the confusion matrix in Specificity = TN/(TN + FP)?

following the example from the scorecard documentation; they default to expecting you to tell them 'good'/'bad' which they do intend to map for your to 0/1 numbers.

library(scorecard)
data("germancredit")
dt_f = var_filter(germancredit, y="creditability")
table(germancredit$creditability)

 bad good 
 300  700 
table(dt_f$creditability)

  0   1 
700 300 

you can see in the documentation for their var_filter function: they ask you to idenify the "positive" class label

positive	
Value of positive class, Defaults to "bad|1".

Ah, var_filter defaults to bad = 1.

Thank you.

Do you agree that I am most interested in low False Positives, from the confusion matrix in Specificity = TN/(TN + FP)?

I think it depends on what you are doing; for example a credit risk department might be focused on risk averse practices and so they might care the most about most accurately identifying bad credit risk so as to avoid that lending, so sensitivity will be a key metric.

Perhaps a department like pricing will take a more holistic view, but its most likely that they wont use raw statistical metrics in deciding key thresholds , but want to incorporate cost estimates. for each of TP/TN/FP/FN
i.e. the cost benefit to identifying a bad risk and avoid them on a loan of a certain size
vs the cost to misidentifying a good as a bad and foregoing that income.
Then the thresholds for setting good and bad can be done from a pricing perspective rather than a straight risk one; so its depending on your goals.

The lender is most concerned about minimizing false negatives: predicted no default, but actually defaulted. Is sensitivity the only measure for this?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.