The practice of software development has created development methodologies such agile development and lean methodology to tackle the complexity of development with the objective of improving the quality and efficiency of software creation. Although Deep Learning is built from software it is a different kind of software and therefore a different kind of methodology is needed. Deep Learning differs most from traditional software development in that a substantial portion of the process involves the machine learning how to achieve objectives. The developer is not completely out of the equation, but is working in concert to tweak the Deep Learning algorithm.
Deep Learning is sufficiently rich and complex a subject that a process model or methodology is required to guide a developer. The methodology addresses the necessary interplay of the need for more training data and the exploration of alternative Deep Learning patterns that drive the discovery of an effective architecture. The methodology depicted as follows:
We begin first we some initial definition of the kind of architecture we wish to train. This will of course be driven by the nature of the data that we a training from and the kind of prediction we seek. The latter is guided by Explanatory Patterns and the former by Feature Patterns. There are a variety ways to optimize our training process, this is guided by the Learning Patterns.
After the selection of our network model and the data we plan on training on, the developer is then tasked with answering the question as to whether adequate labeled training set is available. This process goes beyond conventional machine learning process that divides the dataset into three sets. The machine learning convention has been to create a training set, a validation set and a test set. In the first step of the process, if the training remains high there are several options that can be pursued. The first is to try to increase the size of the model, a second option is perhaps train a bit long (alternatively perform hyper-parameter tuning) and if all fails then the developer tweaks the architecture or attempts a new architecture. In the second step of the process, a develop validates the training against a validation set, if the error rate is high indicating overfitting then the options are to find more data, apply different regularizations and if all fails attempt another architecture. The observations here that differs from conventional machine learning is that Deep Learning has more flexibility in that a developer has the additional options of employing either a bigger model or using more data. One of the hallmarks of deep learning is its scalability in performing well when trained with large data sets.
Trying a larger model is something that a developer has control over, unfortunately finding more data poses a more difficult problem. To satisfy this need for more data one can leverage data from different contexts. In addition one can employing data synthesis and data augmentation to increase the size of training data. These approaches however lead to domain adaptation issues, so a slight change in the traditional machine learning development model is called for. In this extended approach, the validation and training sets are required to belong to the same context. Furthermore, to validate training from this heterogeneous set, another set called the training-validation set is set aside to act as additional validation. This basic process model, inspired by a talk by Andrew Ng, serves as a good scaffolding to hang off the many different patterns that we find in Deep Learning.
As you can see, there are many paths of exploration and many alternatives models that may be explored to reach to a solution. Furthermore, their is sufficient Modularity in Deep Learning that we may compose solutions from other previously developed solutions. Autoencoders, Neural Embedding, Transfer Learning and bootstrapping with pre-trained networks are some of the tools that do provide potential shortcuts that reduce the need to train from scratch.
This is no means complete, but it is definitely a good starting point to guide the development of this entirely new kind of architecture.