Roc Curve Step by Step

...with precision-recall curve thrown in

The ROC (Receiver Operator Characteristic) curve dates to the time of the World War 2 and refers to the performance of radar operators along with their radar scopes for correctly identifying enemy aircraft. Today, it refers to how fast a classifier identifies true positive cases or how consistently it ranks the positive cases over the negative ones.

A couple of approaches

For starters, we would rank the predictions in decreasing order of probability. As we go down the list, we would expect all probabilities linked to labelled positive cases to be consistently ranked above the labelled negative cases. To plot the ROC curve, we would go down the list, say, ten ranks at a time, and keep track of the True Positive Rate (TPR) and the False Positive Rate (FPR), thus generating a series of points on the curve with the coordinates (FPR,TPR).

Another approach, giving us a similar result would be:

  • get the list of prediction probabilities from the classifier,
  • repeatedly mask this list with a range of probability thresholds from high (strict) to low (lax) values say 0.95 to 0.05, to generate predicted labels,
  • use the predicted labels at each step to generate TPR and FPR.

We will use the latter approach to generate our ROC curve.

The implementation

We will use scikit-learn and other libraries to generate our dataset, do the plotting, etc. However, the generation of points for the ROC curve – our purpose here – will be implemented explicitly. Along the way, we will also generate points for a precision-recall curve and plot that as well.

Generate the dataset

We use sklearn to generate the data because it gives us the flexibility to generate the dataset according to our requirements. We have deliberately introduced a class imbalance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#%% Import modules

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.linear_model import LogisticRegression

#%% Utility functions
def get_auc(X,Y):
    # get area-under-curve for curve (X,Y)
    # both X and Y should be limited by limits [0,1]
    # Use trapezium rule: area of a sliver is (x2-x1)*(y2+y1)/2
    # Area under curve is sum of all the slivers bounded by right limits
    # X1 to Xn
    X_diff = np.diff(X)
    Y_sum = (Y + np.roll(Y,1))[1:]
    return np.sum(X_diff * Y_sum / 2)  

#%% Create dataset

X,y = make_classification(n_samples=10000,n_features=2,n_informative=2,
                          n_redundant=0,n_repeated=0,class_sep=1.0,
                          flip_y=0.1, # anamolous labels
                          weights=[0.9,0.1], # Create class imbalance
                          random_state=21) 

sns.scatterplot(x=X[:,0],y=X[:,1],hue=y)
plt.title("Generated Dataset")
plt.xlabel("$X_0$")
plt.ylabel("$X_1$")

plt.show()

Fit the data to a classifier

We choose a Logistic Regression model to fit our data. This is a simple model that plays well with linearly separable data. However, given our ‘engineered’ dataset, it is slightly challenged, thus giving us a distinct ROC curve.

25
26
27
28
29
30
31
32
#%% Train classifier

clf = LogisticRegression()
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.4)

clf.fit(X_train,y_train)

accuracy = clf.score(X_test,y_test)

Compute coordinates for ROC curve

The traditional confusion matrix computed for a classifier prediction usually assumes a probability threshold of 0.5. A classifier probability greater than this threshold means the classifier is more confident that a particular instance belongs to class 1 than to class 0.

To get the ROC curve, which is a more dynamic view of the classification, we compute a series of (virtual) confusion matrices each corresponding to a point on the ROC curve.

33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#Get probabilities for class 1
clf_probs = clf.predict_proba(X_test)[:,1]

#%% Get FPR and TPR values

# Thresholds to use for plotting
thresh = np.linspace(0.95,0.00,num=20,endpoint=True)

# For each threshold, get predictions, a confusion matrix and
# FPR and TPR values. These FPR, TPR tuples are our points for
# the ROC plot.

fpr = []
tpr = []  # List to hold fpr, tpr 
precision = []
acc = []

for t in thresh:
    clf_preds = (clf_probs > t).astype(int)
    tn,fp,fn,tp = confusion_matrix(y_test,clf_preds).ravel()
    fpr.append(fp/(fp+tn))
    precision.append(tp/(tp+fp))
    tpr.append(tp/(tp+fn))
    acc.append((tp+tn)/(tp+tn+fp+fn)) 

Plot the curves

Each point on the curve corresponds to a probability threshold. We sweep this threshold from 0.95 to 0.05 inclusive. The marker sizes reflect this threshold.

55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
#%% Plot ROC curve

plt.plot(fpr,tpr,label=f'ROC AUC:{get_auc(fpr,tpr):0.2f}')
plt.scatter(fpr,tpr,s=thresh*200,alpha=0.4) # superimpose markers scaled on prob threshold
plt.scatter([fpr[9]],[tpr[9]],c='r',s=100)  # red dot for 0.5 threshold

plt.plot(tpr,precision, label=f'PR AUC:{get_auc(tpr,precision):0.2f}')
plt.scatter(tpr,precision,s=thresh*200,alpha=0.4)
plt.scatter([tpr[9]],[precision[9]],c='r',s=100)  # red dot for 0.5 threshold

plt.plot(thresh,acc,'c^--', markersize=4, label='acc vs thresh')

plt.text(0.3,0.4,'Marker sizes denote prob. thresholds')
plt.text(0.3,0.35,'Red marker denotes 0.5 prob. threshold')
plt.plot((0.0,1.0),(0.0,1.0),'k:',label='ROC dummy')

plt.title(f'ROC & PR Curves accuracy:{accuracy:0.2f}')
plt.xlabel('FPR(ROC) or Recall(PR)')
plt.ylabel('TPR(ROC) or Precision(PR)')
plt.grid()
plt.legend(loc='lower right')
plt.show()

data_png roc_png

Interpreting the curves

We have plotted three curves; the ROC curve and the PR curve are the traditional metrics while the third (accuracy) curve helps us track the most common metric for a classifier.

A red marker designates the probability threshold of 0.5. This is the point where the .predict() method of a classifier usually gives its predictions. For the PR curve, the ‘knee’ portion corresponds to the threshold that offers the best precision/recall tradeoff. For the given plots, the knee corresponds to a threshold of ~0.45.

The ROC and PR curves can be used to compare classifier performance on the same dataset using the Area Under Curve (AUC) metric. The greater the AUC for a particular classifier, the better the classifier is deemed to be. The dotted diagonal line denotes the dummy classifier that randomly selects positive cases. It has an AUC of 0.5.

For a mediocre ROC-AUC and PR-AUC, the accuracy curve hovers around an impressive ~0.9. Not surprising considering that accuracy is a misleading and overly optimistic metric for imbalanced datasets.