In the germancredit dataset, the target variable creditability has values 1 = bad and 2 = good. The goal is to predict the bad. Wouldn't it make sense for logistic regression to set good=0 and bad=1? I don't think I see writers doing this. Thank you.
Thanks nirgahamuk. But for logisitic regression, don't I want to predict default with Pr(Y=1 | x)? So shouldn't I recode creditability as good=0 and bad=1?
And am I most interested in low False Positives, from the confusion matrix in Specificity = TN/(TN + FP)?
following the example from the scorecard documentation; they default to expecting you to tell them 'good'/'bad' which they do intend to map for your to 0/1 numbers.
I think it depends on what you are doing; for example a credit risk department might be focused on risk averse practices and so they might care the most about most accurately identifying bad credit risk so as to avoid that lending, so sensitivity will be a key metric.
Perhaps a department like pricing will take a more holistic view, but its most likely that they wont use raw statistical metrics in deciding key thresholds , but want to incorporate cost estimates. for each of TP/TN/FP/FN
i.e. the cost benefit to identifying a bad risk and avoid them on a loan of a certain size
vs the cost to misidentifying a good as a bad and foregoing that income.
Then the thresholds for setting good and bad can be done from a pricing perspective rather than a straight risk one; so its depending on your goals.