- In very simple terms – Dropout is a highly efficient regularization technique, wherein, for each iteration we randomly remove some of the neurons in a DNN(along with their connections; have a look at Fig. 1).
- Here is our main Dropout function with three arguments: – A RandomStream generator, – Any theano tensor(Weights of a Neural Net), and – a float value to denote the proportion of neurons to drop.
- So, while the model is in training phase, we’ll use dropout for our model weights and in test phase, we would simply scale the weights to compensate for all the training steps, where we omitted some random neurons.
- Starting from the first line, we are creating a theano tensor variable , for input(words) and another variable of type , which will take a float value to denote the proportion of neurons to be dropped.
- A few more methods, that are increasingly being used in DNNs now a days(I am omitting the standard L1/L2 regularization here):
The reason I wanted to write about this, is because if you are working with a low level library like Theano, then sometimes using modules like might get a bit tricky.
Implementing a Dropout Layer with Numpy and Theano along with all the caveats and tweaks.
@kdnuggets: Dropout with Theano #DeepLearning #NeuralNetworks
Almost everyone working with Deep Learning would have heard a smattering about Dropout. Albiet a simple concept(introduced a couple of years ago), which sounds like a pretty obvious way for model averaging, further resulting into a more generalized and regularized Neural Net; still when you actually get into the nitty-gritty details of implementing it in your favourite library(theano being mine), you might find some roadblocks there. Why? Because it’s not exactly straight-forward to randomly deactivate some neurons in a DNN.
In this post, we’ll just recapitulate what has already been explained in detail about Dropout in lot of papers and online resources(some of these are provided at the end of the post). Our main focus will be on implementing a Dropout layer in Numpy and Theano, while taking care of all the related caveats. You can find the Jupyter Notebook with the Dropout Class here.
Regularization is a technique to prevent Overfitting in a machine learning model. Considering the fact that a DNN has a highly complex function to fit, it can easily overfit with a small/intermediate size of dataset.
In very simple terms – Dropout is a highly efficient regularization technique, wherein, for each iteration we randomly remove some of the neurons in a DNN(along with their connections; have a look at Fig. 1). So how does this help in regularizing a DNN? Well, by randomly removing some of the cells in the computational graph(Neural Net), we are preventing some of the neurons(which are basically hidden features in a Neural Net) from overfitting on all of the training samples. So, this is more like just considering only a handful of features(neurons) for each training sample and producing the output based on these features only. This results into a completely different neural net(hopefully ;)) for each training sample, and eventually our output is the average of these different nets(any
-phile here? :D).
In Fig. 1, we have a fully connected deep neural net on the left side, where each neuron is connected to neurons in its upper and lower layers. On the right side, we have randomly omitted some neurons along with their connections. For every learning step, Neural net in Fig. 2 will have a different representation. Consequently, only the connected neurons and their weights will be learned in a particular learning step.
Left: DNN without Dropout, Right: DNN with some dropped neurons
Let’s dive straight into the code for implementing a Dropout layer. If you don’t have prior knowledge of Theano and Numpy, then please go through these two awesome blogs by @dennybritz – Implementing a Neural Network from Scratch and Speeding up your neural network with theano and gpu.
As recommended, whenever we are dealing with Random numbers, it is advisable to set a random seed.
, this exposes a number of methods for generating random numbers, drawn from a variety of probability distributions.
in Theano, which works for GPUs as well.
object which will provide us with Random Streams in each run of our Optimization Function.
Now, one thing to keep in mind is – we only want to drop neurons during the training phase and not during the validation or test phase. Also, we need to somehow compensate for the fact that during the training time we deactivated some of the neurons. There are two ways to achieve this:
Scaling the Weights(implemented at the test phase): Since, our resulting Neural Net is an averaged model, it makes sense to use the averaged value of the weights during the test phase, considering the fact that we are not deactivating any neurons here. The easiest way to do this is to scale the weights(which acts as averaging) by the factor of retained probability, in the training phase. This is exactly what we are doing in the above function.
Inverted Dropout(implemented at the training phase): Now scaling the weights has its caveats, since we have to tweak the weights at the test time. On the other end ‘Inverted Dropout’ performs the scaling at the training time. So, we don’t have to tweak the test code whenever we decide to change the order of Dropout layer. In this post, we’ll be using the first method(scaling), although I’d recommend you to play with Inverted Dropout as well. You can follow this up for the guidance.
flag is on or off. So, while the model is in training phase, we’ll use dropout for our model weights and in test phase, we would simply scale the weights to compensate for all the training steps, where we omitted some random neurons.
Finally, here’s how you can add a Dropout layer in your DNN. I am taking an example of RNN, similar to the one used in this blog:
, which will take a float value to denote the proportion of neurons to be dropped.
containing the model parameters.
, with few dropped neurons.
, it throws some updates, and all of the theano functions, following the above code, should be made aware of these updates. So let’s have a look at this code:
function. Else, this will throw an error in Theano.
(as mentioned in the original paper) is generally good enough. Although, you could always try to tweak it a bit and see what works best for your model.
Lately, there has been a lot of research for better regularization methods in DNNs. One of the things that I really like about Dropout is that it’s conceptually very simple as well as an highly effective way to prevent overfitting. A few more methods, that are increasingly being used in DNNs now a days(I am omitting the standard L1/L2 regularization here):
Batch Normalization: Batch Normalization primarily tackles the problem of internal covariate shift by normalizing the weights in each mini-batch. So, in addition to simply using normalized weights at the beginning of the training process, Batch Normalization will keep on normalizing them during the whole training phase. This accelerates the optimization process and as a side product, might also eliminate the need of Dropout. Have a look at the original paper for more in-depth explanation.
might get a bit tricky. Although, for prototyping and even for production purposes, you should also consider other high level libraries like Keras and TensorFlow.
Feel free, to add any other regularization methods and feedbacks, in the comments section.