For classification, ROC curve analysis is conducted on each predictor. For two class problems, a series of cutoffs is applied to the predictor data to predict the class. The sensitivity and specificity are computed for each cutoff and the ROC curve is computed. The trapezoidal rule is used to compute the area under the ROC curve. This area is used as the measure of variable importance. For multi-class outcomes, the problem is decomposed into all pair-wise problems and the area under the curve is calculated for each class pair (i.e. class 1 vs. class 2, class 2 vs. class 3 etc.). For a specific class, the maximum area under the curve across the relevant pair-wise AUCâs is used as the variable importance measure.
For data with two classes, there are specialized functions for measuring model performance. First, the twoClassSummary function computes the area under the ROC curve and the specificity and sensitivity under the 50% cutoff. Note that:
this function uses the first class level to define the âeventâ of interest. To change this, use the lev option to the function
there must be columns in the data for each of the class probabilities (named the same as the outcomeâs class levels)
twoClassSummary(test_set, lev = levels(test_set$obs))
## ROC Sens Spec
## 0.9560044 0.9336735 0.8246269
A similar function can be used to get the analugous precision-recall values and the area under the precision-recall curve:
prSummary(test_set, lev = levels(test_set$obs))
## AUC Precision Recall F
## 0.8582695 0.5648148 0.9336735 0.7038462
Also some nice figures re. ROC Curve vs AUC in this tutorial here: