A Pattern Language for Deep Learning

A Pattern Language for Deep Learning

Pattern Languages are languages derived from entities called patterns that when combined form solutions to complex problems. Each pattern describes a problem and offers solutions. Pattern languages are a way of expressing complex solutions that were derived from experience such that others can gain a better understanding of the solution.

Pattern Languages were originally promoted by Christopher Alexander to describe the architecture of businesses and towns. These ideas where later adopted by Object Oriented Programming (OOP)practitioners to describe the design of OOP programs, these were named Design Patterns. These were extended further into other domains like SOA (http://www.manageability.org/blog/stuff/pattern-language-interoperability/view) and High Scalability (http://www.manageability.org/blog/stuff/patterns-for-infinite-scalability/view).

In the domain of Machine Learning (ML) there is an emerging practice called “Deep Learning”. In ML there are many new terms that one encounters such as Artificial Neural Networks, Random Forests, Support Vector Machines and Non-negative Matrix Factorization. These however usually refer to a specific kind of algorithm. Deep Learning (DL) however is not really one kind of algorithm, rather it is a whole class of algorithms that tend to exhibit similar ‘patterns’. DL systems are Artificial Neural Networks (ANN) that are constructed with multiple layers (sometimes called Multi-level Perceptrons). The idea is not entirely new, since it was first proposed back in the 1960s.. However, interest in the domain has exploded with the help of advancing hardware technology (i.e. GPU). Since 2011, DL systems have been exhibiting impressive results in the field.

The confusion with DL arises when one realizes that there actually many implementations and it is not just a single kind of algorithm. There are the conventional Feed forward Networks (aka. Fully Connected Networks), Convolution Networks (ConvNet), Recurrent Neural Networks (RNN) and less used Restricted Boltzmann Machines (RBM). They all share a common trait in that these networks are constructed using a hierarchy of layers. One common pattern for example is the employment of differentiable layers, this constraint on the construction of DL systems leads to an incremental way to evolve the network into something that learns classification. There are many such patterns that have been discovered recently and it would be very useful for practitioners to have at their disposal a compilation of these patterns. In the next few weeks we will be sharing more details of this Pattern Language.

Pattern languages are an ideal vehicle for describing and understanding Deep Learning. One would like to believe the Deep Learning has a solid fundamental foundation based on advanced mathematics. Most academic research papers will conjure up high-falutin math such as path integrals, tensors, Hilbert spaces, measure theory etc. but don’t let the math distract oneself from the reality that our understanding is minimal. Mathematics you see has its inherent limitations. Physical scientists have known this for centuries. We formulate theories in such a way that the structures are mathematically convenient. The Gaussian distribution for example is prevalent not because its some magical construct that reality has gifted to us. It is prevalent because it is mathematically convenient.

Pattern languages have been leveraged in many fuzzy domains. The original pattern language revolved around the discussion of architecture (i.e. buildings and towns). There are pattern languages that focus on user interfaces, on usability, on interaction design and on software process. These all don’t have concise mathematical underpinnings yet we do extract real value from these pattern languages. In fact, the specification of a pattern language is not too far off from the creation of a new algebra in mathematics. Algebras are strictly consistent but they are purely abstract and may not need to have any connection with reality. Pattern languages are however connected with reality, however consistency rules are more relaxed. In our attempt to understand the complex world of machine learning (or learning in general) we cannot always leap frog into mathematics. The reality may be such that our current mathematics are woefully incapable of describing what is happening.

Visit www.deeplearningpatterns.com for ongoing updates.

canonical

 

2 thoughts on “A Pattern Language for Deep Learning

  1. Nice blog. Refreshing to see some content that doesn’t immediately devolve into differential equations. I have to agree with you here. Having jumped into AI research recently, I get the sense that some of the art of the field is drowning in a mathematics-only definitions… as if being able to compute/predict functions is the only important component of the field, or the only place to make progress. Sometimes I think that the importance of structural concepts like convolution, network depth, and pooling are drowning in Calculus and matrix math jargon. (Perhaps because I don’t have a solid grasp of some of the math!). I understand the importance of being able to define things so that they can be translated into computing code, but I think some of the biggest ideas may still be waiting to be found in the area of network structures, timing, etc… that aren’t strictly defined in terms of computational or mathematical functions.

    1. The reality about mathematics (and this has been known for a very long time) is that there are only few closed form analytic solutions that allow us to make predictions. Most complex systems are intractable mathematically.

      Deep Learning is really an experimental science and the current Deep Learning architectures used in practice are intractable mathematically. There are simpler forms that are used to study characteristics and these are important in the sense is that we get a better understanding of how networks work.

      My opinion is that complexity emerges out of simple structures and neural networks are successful dues the sheer size of the networks. There is something fundamental in nature that allows for intelligence to emerge.

Leave a Reply

Your email address will not be published. Required fields are marked *