Note: This is a short version of “Deep Learning – The Unreasonable Effectiveness of Randomness”.
The paper submissions for ICLR 2017 in Toulon France deadline has arrived and instead of a trickle of new knowledge about Deep Learning we get a massive deluge. This is a gold mine of research that’s hot off the presses. Many papers are incremental improvements of algorithms of the state of the art. I had hoped to find more fundamental theoretical and experimental results of the nature of Deep Learning, unfortunately there were just a few. There was however 2 developments that were mind boggling.
The mind boggling discovery that you can train a neural network to learn to learn (i.e. meta-learning). More specifically, several research groups have trained neural networks to perform stochastic gradient descent (SGD). Not only have they been able to demonstrate neural networks that have learned SGD, the networks have performed better than any hand tuned human method! The two papers that were submitted were”Deep Reinforcement Learning for Accelerating the Convergence Rate” and “Optimization as a Model for Few-Shot Learning” . Unfortunately though, these two groups have been previously scooped by Deep Mind, who showed that you could do this in this paper “Learning to Learn by gradient descent by gradient descent“. The two latter papers trained an LSTM, while the first one trained via RL. I had thought that it would take a bit longer to implement meta-learning, but it has arrived much sooner than I had expected!
Not to be out-done, two other research groups created machines that design new Deep Learning networks and do it in such a way as to improve over the state-of-the-art! These are machines that learn how to design neural networks. The two papers that were submitted are “Designing Neural Network Architectures using Reinforcement Learning” and “Neural Architecture Search with Reinforcement Learning”. The former paper describes the use of Reinforcment Q-Learning to discover CNN architectures (You can find some of their generated CNNs in Caffe here: https://bowenbaker.github.io/metaqnn/ ). The latter paper is however truly astounding (you can’t do this without Google’s compute resources). Not only did the researchers show the generation of state-of-the-art CNN networks, the machine actually learned a few more variants of the LSTM node! Here are the LSTM nodes the machine created (left and bottom):
So not only are researcher who hand optimize gradient descent solutions out of business, so are folks who make a living designing neural architectures! This is actually just the beginning of Deep Learning systems just bootstrapping themselves. So I must now share Schmidhuber’s cartoon that aptly describes what is happening:
This is absolutely shocking and there’s really no end in sight as to how quickly Deep Learning algorithms are going to improve. This meta capability allows you to apply it on itself, recursively creating better and better systems.