Regressor as Classifier

... it works better than you think.

Page content

A classifier predicts a categorical target variable while a regressor predicts a continuous response variable. Can we fit a regressor as a classifier? Let’s find out.

Experiment

Create the dataset

Create a dataset with the sklearn make_classification() method. We create a dataset which is not too challenging so as not to introduce any extraneous factors into our experiment.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
from sklearn.datasets import make_classification
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import svm

# Create dataset
X,y = make_classification(n_samples=1000,n_features=2,n_informative=2,
                          n_redundant=0, n_repeated=0, class_sep=2.0,
                          flip_y=0.5,
                          weights=[0.6,0.4],
                          random_state=None)


sns.scatterplot(x=X[:,0],y=X[:,1],hue=y)
plt.title("Generated Dataset")
plt.xlabel("$X_0$")
plt.ylabel("$X_1$")

plt.show()

Generated dataset

Fit the classifier

Fit the classifier using sklearn function LogisticRegression(). Split the data into training and testing sets and fit the classifier on the training set. Make the predictions using the test set.

23
24
25
26
27
28
29
30
31
32
33
34
clf = LogisticRegression()


X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.4)

# Training the model using the training set
clf.fit(X_train,y_train)

#accuracy = clf.score(X_test,y_test)

# Making the predictions using the test set.
y_pred_proba = clf.predict_proba(X_test)

Fit the regressor

Fit the regressor using sklearn function LinearRegression(). Use the train set for fitting the regressor and then move on to predicting the model using the test set.

35
36
37
38
39
40
41
regr = LinearRegression()

# Training the model using the training set
regr.fit(X_train,y_train)

# Making the predictions using the test set
y_pred = regr.predict(X_test)

Get the prediction probabilities and plot them

After we obtain the prediction probabilities for the classifier and the scores (predicted response variable) for the regressor, plot them against each other.

42
43
44
45
46
47
48
sns.scatterplot(x=y_pred, y=y_pred_proba[:,1],hue=y_test)

plt.title("Classification dataset fitted to a regressor")
plt.xlabel("Regressor predictions")
plt.ylabel("Classifier probabilities")
plt.grid()
plt.show()

Regressor as a classifier

We will also plot the distribution of scores in both the cases (classifier and regressor).

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
fig, ax = plt.subplots(1,2, figsize=(10,6))

ax[0].hist(y_pred)
ax[0].set(title="Predicted regression distribution",
          xlabel="Regressor predicted values",
          ylabel="Count")
ax[0].grid()

ax[1].hist(y_pred_proba[:,1])  # Just the probability for class 1

ax[1].set(title="Predicted classification distribution",
          xlabel="Classification predicted values",
          ylabel="Count")
ax[1].grid()

plt.tight_layout()
plt.show()

Prediction distribution

Interpreting the results

Examining the s-curve, one can clearly see that the true positive predictions for both the classifier and the regressor (keeping 0.5 as the threshold) are the same.

For linearly separable data, keeping a score (or probability) threshold of 0.5, the regressor would consistently do as well as the classifier in predicting labels.

The s-shape, and the histogram on the right suggest that the classifier is more confident than the regressor because it has clumped its prediction probabilities closer to 1 or 0. Reducing the class-separation tends to flatten the s-curve to a near straight line (can be verified experimentally).

Conclusion

We need to cautiously interpret our results. The regressor is an underconfident classifier in predicting labels. By keeping the probability threshold to 0.5, the regressor behaves almost the same way as the classifier at least for linearly separable data.