The inevitable assignment in your Deep learning class: Neural networks using only numpy
Comes in handy during interviews. For beginners and experts alike
Defining a Neural network
The way we define the architecture of a fully connected feed-forward neural network is using hyperparameters, mentioning the number of layers, the number of neurons in each layer and the activation function for each layer.
The reason the output of the previous layer and input of the next layer is repeated is for ease of parsing it and is also intuitive to look at.
The basics
The neural network depending on its architecture will have weights, biases and activation functions. We also need the network to do forward propagation (compute y given x), backward propagation (update weights given the ground truth and prediction). For simplicity, we only train the network with MSE (Mean squared error) loss and SGD (Stochastic Gradient Descent).
So we will start with a basic class definition that provides the above features.
Building the network
Now that we know the hyperparameters we can initialise weight matrices and store them in member variables. weights[i]
is a numpy array containing the weight matrix of layer i. The same holds for biases[i]
and activation_functions[i]
.
Activation functions
Keeping the network aside, for now, we will ready up the activation functions. We use 2 simple commonly used activation functions sigmoid and relu. You are welcome to use your own. It is just plug-and-play cause of the way the rest of the code is written.
The derivative definition of each activation function is essential to implement the backpropagation part.
We now define a dictionary to refer to function definition from strings. If you are using other activation functions just make sure you include them in these dictionaries and the rest of the code will work fine.
Forward propagation
We now have all the necessary elements to write the forward()
function. Given an input vector, the function needs to sequentially multiply it with the weight matrix, add bias and apply activation function for each layer.
I hope you followed along and were able to write functioning code till now. I believe backpropagation is a bit involved so I’ve written a separate article to explain that thoroughly.
Thanks for reading. Do comment your queries and suggestions.