Perceptron From Scratch

Power of a neuron

Perceptron From Scratch
Page content

Neural Networks are very fascinating, but they are very complicated to understand. To simplify our understanding, we start with the simplest unit of the neural network – the perceptron. We want to understand what role the perceptron plays and its functioning as a simple logical unit. This should hopefully improve our understanding of a neural network.

What is a perceptron?

A perceptron is an artificial neuron that takes in several inputs, processes them and produces an output. There are three components constituting the input side of a perceptron:

  • the inputs themselves
  • a weight associated with each input
  • a bias

The perceptron computes its output as follows: $$ f(x) = w_0+w_1x_1+w_2x_2+w_3x_3+…w_nx_n \tag{1} $$

In other words, the output of a perceptron is the weighted sum of its inputs, optionally conditioned by an activation function. We will not be considering an activation function in this exploration.

In the expression above, the \(w_0\) is the bias and each \(w_ix_i\) is an input multiplied by its corresponding weight.

The perceptron is quite similar to the biological neuron as it has inputs, processing and output just as that of the biological neuron. Many perceptrons combine to form a neural network containing many layers capable of performing complex learning tasks.

How does a perceptron learn?

Perceptrons learn in the same way as the human brain does. The biological neuron has structures like the nucleus, dendrites and axon. The dendrites collect information from other neurons, the neuron processes and sends it out through the axon(the long tail below the neuron). The perceptron is a derivative of the neuron and has a similar structure but in a more simplified form.

What does it mean for a perceptron to learn? Perceptrons learn through supervised methods. A perceptron learns by adjusting its weights(see equation above) so as to learn the best approximation of the correct function \(f(x)\) that gives the expected set of outputs for the corresponding set of inputs of the form \(x_1…x_n\). This is done by training the perceptron repeatedly on a training set containing a lot many training samples. Each such repetition is called an epoch.

Each epoch takes the perceptron closer and closer to the expected function \(f(x)\) by the process of adjusting its weights and bias. This is also called fitting the model to the training data.

The cost function

We are always interested in knowing how far we have come in training our perceptron. This is achieved by calculating a so-called cost function. Although there are many kinds of cost functions, we will stick to a quadratic cost function which is defined as follows:

$$ E(\vec{w}) = 0.5\sum\limits_{d \in D} (t_d - o_d)^2 \tag{2} $$

where, \(E, \vec{w}, d \in D, t_d, o_d\) are respectively the error(cost), the weight vector, the particular training instance as a subset of the training set, the target output for the training instance and the actual output for the training instance. The total error is a summation of the errors across all the training instances.

This function gives us a parabolic error surface with a global minimum. This error is a function of the weight vector. To arrive at the weight vector that gives us the minimum error, we need to descend the gradient of the error surface to the point of the global minimum. The direction of the steepest descent down the gradient is given by:

$$ \nabla{E(\vec{w})} \tag{3} $$

and the update to the weight vector is given by

$$ \Delta{\vec{w}} = -\eta\nabla{E(\vec{w})} \tag{4} $$

where, \(\eta\) is the learning rate chosen to be small enough so as not to skip over the minimum point on the parabolic error surface.

After derivation steps (based on ‘Machine Learning’ by Tom M. Mitchell), we arrive at an update rule for the individual weight:

$$ \Delta{w_i} = \eta\sum\limits_{d \in D}(t_d-o_d)x_{id} \tag{5} $$

The above expression considers the entire training set \(D\) to get to the change in the weight. However, we will look at a variation called stochastic gradient descent(SGD) where we update weights after each training instance in the training set. This approach approximates the standard gradient descent to any arbitrary degree of closeness.

Follow-up after training

Two steps that closely follow the training step are:

  1. Evaluation of the training process itself.
  2. Evaluation of the performance of the trained perceptron.

For evaluation of the training process, we need to keep track of the cost function both for the training set and the cross-validation set and use graphical means to watch their convergence. Optionally, we can also plot the convergence of the weights. This data is captured after each epoch.

For evaluation of the trained perceptron performance, we can consider classification or regression metrics that are appropriate to the problem at hand.

Implementation notes

We will now use the above theoretical explanation to implement a simple perceptron, train it as a D2A converter, evaluate the training process, and finally evaluate the performance of the trained perceptron.

The perceptron class

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
#!/usr/bin/env python3

"""
Module to implement a single perceptron and a
helper class to generate training data.

Reference implementation: Create a D2A converter using a 
single perceptron.
"""
import numpy as np
import matplotlib.pyplot as plt

class Perceptron():
    """
    Class to implement an n-input perceptron with identity activation.
    
    Arguments:
        n_inputs: no. of inputs
        
    Attributes:
        _weights: list of weights(w0,...,wn), one per epoch
        _cost: list of cost function values(one per epoch)
        _epochs: no. of epochs to train the perceptron
        
    Methods:
        fit(X,y,n_epochs=10,batch_size=1): train the perceptron with
            training data X,y for n_epochs(default 10) with batch size
            of batch_size(default 1)
        predict(X): predict y for test data X on an already fitted perceptron.
            
    Since the training  and test data are deterministic,
    we do not need cross-validation.  It is enough if we
    just look at the cost function.
    """

    def __init__(self, n_inputs):
        """Create perceptron with n inputs."""
        
        self._n_inputs = n_inputs
        # Generate n_inputs random weights[0,1) and 1 bias.
        self._weights = np.random.uniform(low=-0.1, high=0.1, size=(n_inputs+1,))
        # Will get updated at end of every epoch.  At end of training, has all
        # weights of a trained perceptron.
        # history of weights
        self._weights_hist = []
        # store the initial weights first
        self._weights_hist.append(self._weights)
        
        # Array of costs 1 per epoch.
        self._cost_hist = []
        
        # For how many epochs have we trained?
        self._epochs = 0
                
        
    def fit(self,X,y, epochs=10, batch_size=1):
        """Train the perceptron with X,y
        
        Arguments: 
            epochs: no.of epochs to train
            batch_size: batch size            
        """
        lr = 0.01  #learning rate
        
        for i in range(epochs):
            for x_vec,y_true in zip(X,y):
                x_vec = np.append([1],x_vec) # Prepend 1 for the bias
                output = np.dot(self._weights,x_vec).squeeze()
                
                #Calculate delta_w from error gradient and learning rate
                ######### Update the weights ############
                delta_w = lr*x_vec*(y_true-output)
                self._weights = self._weights+delta_w

            #Keep track of the cost function
            cost_func = 0.5*(output-y_true)**2
            self._cost_hist.append(cost_func)
                            
            #keep weights history
            self._weights_hist.append(self._weights)
                
            self._epochs += 1  #Increment epochs count
            
        #Return training history
        return {"cost":self._cost_hist, "weights":self._weights_hist,"epochs":self._epochs}
        
    def predict(self,X):
        """
        Predict target y for test data X
        Arguments: X test data
        """
        #Insert a 1 in the first column(index 0) of all X rows
        _X = np.insert(X, 0, 1, axis=1)
        return np.dot(_X,self._weights)
        
    def __repr__(self):
        """
        Print representation of the object.
        """
        return f'perceptron <{id(self):x}> inputs:{self._n_inputs}, epochs:{self._epochs}, weights:{self._weights}'

This is an implementation of a perceptron parameterized on the number of inputs.

The constructor does the following:

  • Create n_inputs and generate their corresponding random weights and a bias.
  • Create provision to capture weights history, cost history and epochs.

The fit() method used to train the perceptron over the specified number of epochs:

  • uses equation (2) to calculate the cost.
  • uses equation (5) to directly update the weight vector.
  • accumulate history of both the cost and the weights.

We exploit the capabilities of numpy to vectorize our computations.

The predict() method used to predict values involves:

  • equation (1) to calculate the dot product of weights and inputs.

There is no need of finding the error or the cost because we are out of the training phase and already have a trained perceptron.

The data generator class

Although, not part of the perceptron implementation itself, a data generator is very important to our task. After all, a model is only as good as the data it is trained on. It has two methods:

  • get_samples() method to get random vector samples training
  • get_cons_samples() method to get consecutive samples for testing and evaluation
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
class Vector_generator():
    """ 
    Generate list of binary digit vectors of specified length and width.
    
    Arguments:
        n_bits: width of the vector(default 8).
        
    Methods:
        get_samples(n_samples=10): get specified number of random
            samples(default 10) of type X,y with y scaled[0,1)
        get_cons_samples(): get consecutive samples from 0 to 2^n_bits-1        
    """

    def __init__(self, n_bits=8):
        """Initialize a vector generator of width n_bits"""
        self._n_bits = n_bits
        self._X_arr = []
        self._y_arr = []
        
    def get_samples(self, n_samples=10):
        """Get specified no. of random vectors(default 10)"""
        
        self._X_arr = []
        self._y_arr = []
        
        for i in range(n_samples):
            # Get 1 random integer in range 0,2**n_bits
            n = np.random.randint(2**self._n_bits)
            # Convert it to binary vector
            X = [int(i) for i in list(f'{n:0{self._n_bits}b}')]
            # Alternatively
            # X = np.array(list(f'{n:0{_n_bits}b}')).astype(int)
            # Its equivalent floating point value in range [0,1)
            y = round(n/(2**self._n_bits),6)
            self._X_arr.append(X)
            self._y_arr.append(y)
            
        return self._X_arr, self._y_arr

        
    def get_cons_samples(self):
        """get consecutive samples from 0 to 2^n_bits-1"""
        
        self._X_arr = []
        self._y_arr = []
        
        for cons_n in range(2**self._n_bits):
            # Get the integer from the range 0,2**n_bits
            # Convert it to binary vector
            X = [int(i) for i in list(f'{cons_n:0{self._n_bits}b}')]
            y = round(cons_n/(2**self._n_bits),6)
            self._X_arr.append(X)
            self._y_arr.append(y)
            
        return self._X_arr, self._y_arr

Training and evaluation

The following code illustrates how to use the perceptron class to instantiate, train and evaluate a perceptron.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
if __name__ == "__main__":
    # Create a data generator
    v_gen10bit = Vector_generator(n_bits = 10)    
    X,y = v_gen10bit.get_samples(n_samples=200)
    
    #Create the perceptron
    percep = Perceptron(n_inputs = 10)
    print(percep)
    
    #Train the perceptron
    training_history = percep.fit(X,y,epochs=50)
    print(percep)

    #plot the minimization of the cost function and ...    
    #...the convergence of the weights+bias
    #In short, plot the training progress.
    fig,ax = plt.subplots(1,3, figsize=(15,5))
    
    # Some nuances of plotting. For plotting costs, epochs start at 1
    # Cost at the end of each epoch    
    ax[0].plot(np.arange(1,training_history["epochs"]+1),training_history["cost"], '.-')
    ax[0].set_xticks(np.arange(0,training_history["epochs"]+1,5))
    ax[0].set_title("Cost function vs. Epoch")
    ax[0].set_xlabel("Epoch")
    ax[0].set_ylabel("Cost")
    ax[0].grid()
    

    # There is one more set of weights than number of epochs because
    # We also plot the initial weights before we started the first epoch
    # Epoch starts with 0

    ax[1].plot(training_history["weights"], '.-')
    ax[1].set_xticks(np.arange(0,training_history["epochs"]+1,5))
    ax[1].set_title("Weight vs. Epoch")
    ax[1].set_xlabel("Epoch")
    ax[1].set_ylabel("Weights")
    ax[1].grid()

    #Try out predictions with our perceptron
    #Generate consecutive data
    Xt,yt = v_gen10bit.get_cons_samples()
    y_pred = percep.predict(Xt)
    
    #Plot y_pred vs. y_true
    ax[2].scatter(yt, y_pred, c = 'c', alpha=0.5, label='predicted')
    ax[2].plot(yt,yt, 'k', label='identity')
    ax[2].set_title("y_pred vs. y_true scaled [0,1]")
    ax[2].set_xlabel("y_true")
    ax[2].set_ylabel("y_pred")
    ax[2].legend(loc='best')
    ax[2].grid()
    plt.tight_layout()
        
    plt.show()

Evaluation using plots

The first two plots track the training process. The last shows how well our perceptron output tracks the true output.

Training and Evaluation

Experiment with the code

  • Vary the number of inputs. Create and train perceptrons with these inputs.
  • Change/vary the learning rate. What happens if we choose too small or large values?
  • Change the number of epochs. How does an inadequately trained perceptron behave?
  • Train the perceptron with insufficient training data. For instance, train an 8-input perceptron with only 128 samples. How does the perceptron fare in the evaluation?

To get the complete version of the code, refer to this github link.