Likelihood ratio: keeping your classifier honest

Is a respectable accuracy score enough?

Page content

Imagine training a classifier on a dataset only to find your friend is almost as good guessing at the target label, that too without looking at the data. Is your classifier any good then?

Accuracy is often touted as an important figure-of-merit for a classifier. We also know that accuracy by itself is useless for data with high class imbalance.

Now, where we have reasonable class imbalances, and a fairly high accuracy score, how do we find out if our classifier is good enough?

Likelihood ratio

Likelihood ratio tells us the improvement in the odds of identifying a factual positive case by our classifier over a random(dummy) classifier. In other words, does the probability of a case being actually positive increase if it is classified positive by a classifier compared to a random classifier picking a case as positive?

Odds are calculated from the prevalence (probability of a positive case) as follows.

$$ \frac{probability}{1-probability} $$

Case study

Consider a confusion matrix of a classifier from my Kaggle notebook for the Engine Health dataset.

There is a mild class imbalance of 63/37 and we get an accuracy of around 66%. This might sound acceptable and even surprising given that none of the features (refer to the Kaggle notebook for the EDA) are distinguished enough to discriminate between the classes.

The revelation

If you refer to the Kaggle notebook, you would note that we also pushed in a dummy classifier into our pipelines that was giving us an accuracy of 63%. This itself warns us that our classifier may not be any good.

Calculating the likelihood ratio tells us the real story. The likelihood ratio tells us how good a classifier is compared to a random classifier if at all.

The likelihood ratio is very easy to calculate from a confusion matrix. It is given by:

$$ \frac{tp/(tp+fn)}{fp/(fp+tn)} $$

Intuitively, it is the ratio of infusion of positive cases (something desirable) divided by the ratio of leakage of negative cases (something undesirable). The higher this quotient, the better is our classifier compared to a random guess.

For the current case, we get a likelihood ratio of (3239/(3239+456))/(1546/(1546+620)) or roughly 1.2.

This means, our classifier improves the odds by just about 1.2 – hardly any improvement. We might be better off randomly picking the engine condition and we would be right 63% of the time!

The upshot

So, if we have two classifier of comparable accuracy, we know we should pick the one with the higher likelihood ratio.