Deep Learning is a sub-field of Machine Learning that has its own peculiar ways of doing things. Here are 10 lessons that we’ve uncovered while building Deep Learning systems. These lessons are a bit general, although they do focus on applying Deep Learning in a area that involves structured and unstructured data.
The More Experts the Better
The one tried and true way to improve accuracy is to have more networks perform the inferencing and combining the results. In fact, techniques like DropOut is a means of creating “Implicit Ensembles” were multiple subsets of superimposed networks cooperate using shared weights.
Seek Problems where Labeled Data is Abundant
The current state of Deep Learning is that it works well only in a supervised context. The rule of thumb is around 1,000 samples per rule. So if you are given a problem where you don’t have enough data to train with, try considering an intermediate problem that does have more data and then run a simpler algorithm with the results from the intermediate problem.
Search for ways to Synthesize Data
Not all data is nicely curated and labeled for machine learning. Many times you have data that are weakly tagged. If you can join data from disparate sources to achieve a weakly labeled set, then this approach works surprisingly well. The most well known example is Word2Vec where you train for word understanding based on the words that happen to be in proximity with other words.
Leverage Pre-trained Networks
One of the spectacular capabilities of Deep Learning networks is that bootstrapping from an existing pre-trained network and using it to train into a new domain works surprisingly well.
Don’t forget to Augment Data
Data usually have meaning that a human may be aware of that a machine can likely never discover. One simple example is a time feature. From the perspective of a human the day of the week or the time of the day may be important attributes, however a Deep Learning system may never be able to surface that if all its given are seconds since Unix epoch.
Explore Different Regularizations
L1 and L2 regularizations are not the only regularizations that are out there. Explore the different kinds and perhaps look at different regularizations per layer.
Randomness Initialization works Surprisingly Well
There are multiple techniques to initialize your network prior to training. In fact, you can get very far just training the last layer of a network with the previous layers being mostly random. Consider using this technique to speed up you Hyper-tuning explorations.
End-to-End Deep Learning is a Hail Mary Play
A lot of researchers love to explore end-to-end deep learning research. Unfortunately, the most effective use of Deep Learning has been to couple it with out techniques. AlphaGo would not have been successful if Monte Carlo Tree Search was not employed.
Resist the Urge to Distribute
If you can, try to avoid using multiple machines (with the exception of hyper-parameter tuning). Training on a single machine is the most cost effective way to proceed.
Convolution Networks work pretty well even beyond Images
Convolution Networks are clearly the most successful kind of network in the Deep Learning space. However, ConvNets are not only for Images, you can use them for other kinds of features (i.e. Voice, time series, text).
That’s all I have for now. There certainly a lot more other lessons. Let me know if you stumble on others.
You can find more details of these individual lessons at http://www.deeplearningpatterns.com