I spend most of my waking time ( and likely my subconscious works overtime while I sleep ) studying Deep Learning. Peter Thiel has a phrase, “The Last Company Advantage”. Basically you don’t necessarily need to have the “First Mover Advantage” however you absolutely want to be the last company standing in your kind of business. So Google may be the last Search company, Amazon may be the last E-Commerce company and Facebook hopefully will not be the last Social Networking company. What keeps me awake at night though is that Deep Learning could in fact be the “Last Invention of Man”!
However, let’s ratchet it down a little bit here. After all, Kurzweil’s Singularity (estimate is 2045) is still 3 decades away. That’s still plenty of time for us humans to scheme on our little monopolies. Your objective in the next 30 years of humankind is to figure out if you are going to be living Elysium or in some unnamed decaying backwater:
Credit: Elysium the movie, not the life-extension supplement.
Or said in differently, whether you take the blue pill or the red pill (or maybe I meant state).
To aid you in your decision making, here are 10 reasons why your “experts” are saying that will lead you to an unfortunate reality that you will miss the train:
Figure Illustration by John Manoogian III. Cognitive biases can be organized into four categories: biases that arise from too much information, not enough meaning, the need to act quickly, and the limits of memory. From https://en.wikipedia.org/wiki/List_of_cognitive_biases
It’s just Machine Learning (It is a generalization of what I used to do. Well Traveled Road Effect Bias)
Practitioners introduction to neural networks are almost always via the introduction of linear regression and then to logistic regression. That’s because the mathematical equations for an artificial neural network (ANN) are identical. So there immediately is a bias here that the characteristics of these classical ML methods would also convey into the world of DL. Afterall, DL in its most naive explanation is nothing more than multiple layers of ANN.
There are also other kinds of ML methods that have equations that are different from DL. The basic objective however for all ML methods is a general notion of curve fitting. That is if you can have a good fit of a model with the data then that perhaps is a good solution. Unfortunately with DL systems, due to the fact that the number of parameters in the model are so large, these systems by default will over-fit any data. This is enough of a tell that a DL is an entirely different kind of animal from an ML system.
It’s just Optimization (It is a simple case of what I usually do. Illusion of Control Bias )
DL systems have a loss function that is a measure of how well its predictions match its input data. Classic optimization problems also have loss functions (also known as objective functions). In both systems different kinds of heuristics are used to discover an optimal point in a large configuration space. It was once thought that the solution surface of a DL system was sufficiently complex enough that it would be impossible to arrive at a solution. However, curiously enough, one of the most simple methods of optimization, the Stochastic Gradient Descent algorithm, is all that is need to arrive at surprising results.
What this tells you is that is something else going on here that is actually very different from what optimization folks are used to.
It’s a black box ( I can’t trust the unknown. Ambiguity Effect )
A lot of Data Scientists have an aversion for DL because of the lack of interpretability of its predictions. This is a characteristic of not only DL methods but classical ML methods as well. Data Scientists would rather use Probabilistic methods where they can have better control of the models or priors. As a result have systems that are able to make predictions with the least number of parameters. All driven by the belief that parsimony or Occam’s razor is the optimal explanation for everything.
Unfortunately probabilistic methods are not competitive in classifying images, speech or even text. That’s because DL methods are superior in discovering models than human beings. Brute force just happens to trump wetware. There’s no experimental evidence in the DL space that parsimonious models work any better than entangled models. For those cases where it is an absolute requirement to have some kind of explanation, there are now newer methods in DL that provide aid to interpretability as well as uncertainty. If a DL system can generate the captions in an image, then there is a good chance that it can be trained to generate an explanation of a prediction.
It’s too early and too soon ( I don’t trust anything that’s new. Illusion of Validity Bias )
This is a natural bias that something that is around 5 years old and rapidly evolving is too new and volatile a technology to trust. I think we all said the same thing when the microprocessor, internet, web, mobile technologies came along. Wait and see was the safe approach for most everyone. This is certainly a reasonable approach for anyone who has not really spent the time investigating the details. However, it is a very risky strategy, ignorance may be bliss but another company eating your lunch can mean extinction.
There is too much hype. (Conservatism Bias)
There are a lot things that DL can do that were deems inconceivable just a couple years ago. Nobody expected a computer to beat the best human player in Go. Nobody expected self-driving cars to exist today. Nobody expected to see Star Trek universal translator like capabilities. It is so unbelievable that it must likely be an exaggeration than something that may be real. I hate however to burst your bubble of ignorance, DL is in fact very real and you experience it yourself with every smart phone.
AI winter will likely come again. (Frequency Illusion Bias)
We’ve had so many times where the promise of AI had lead to disappointing results. The argument goes further that because it has happened so often before, that it is also bound to happen again. The problem with this argument is that despite disappointment, AI research has led to many software capabilities that we do take for granted today and thus never notice its existence. Good old fashioned AI (GOFAI) are embedded in many systems today.
The current pace of DL development is accelerating and there are certainly certain big problems that need to be solved. The need for a lot of training data and the lack of unsupervised training are two problems. This however doesn’t mean that what we have today has no value. DL can already drive cars, that in itself tells you that even if another AI winter arrives, we would have achieved a state of development that is still quite useful.
There’s not enough theory of how it works. (System Justification Bias)
The research community does not have a solid theoretical understanding as to why DL works so effectively. We have some idea as to why a multi-layer neural network is more efficient in fitting functions than one with less layers. We however don’t have an understanding as to why convergence even occurs or why good generalization happens. DL at this time is very experimental and we are just learning to characterize these kinds of systems. Meanwhile, despite not having a good theoretical understanding, the engineering barrels forward. Researchers, using their intuition and educated guesses are able to build exceedingly better models. In other words, nobody is stopping their work to wait for a better theory. It is almost analogous with what happens in biotechnology research. People are experimenting with many different combinations and arriving at new discoveries that they have yet to explain. Scientific and technological progress is very messy and one shouldn’t shy away from the benefits because of the chaos.
It is not biologically inspired. (Anthropomorphism Bias)
DL system are very unlike the neurons in our brain. The mechanism of how DL learns (i.e. SGD) is not something we can explain happening in our brain. The argument here though is that if it doesn’t resemble the brain then it is unlike to be able to perform the kind of inference and learning of a brain. This of course is an extremely weak argument. Afterall, planes don’t look like birds, but they certainly can fly.
I’m not an expert in it. (Not Invented Here Bias)
Not having expertise in-house shouldn’t be an excuse of avoiding finding expertise outside. Furthermore, should prevent you from having your experts learn this new technology. However, if these experts are of the dogmatic persuasion, then that should be a tell for you to get a second and unbiased opinion.
It does not apply to my problems (Ostrich Effect)
Businesses are composed of many business processes. Unless you have not gone through the exercise of examining which processes can be automated with current DL technologies, then you are not in a position to make the statement that DL does not apply to you. Furthermore, you may discover new processes and business opportunities may not exist today but are possible with the exploitation of DL technology. You cannot really answer this question until you invested in some due diligence work.
I don’t have the resources (Status Quo Bias)
The large internet companies like Google and Facebook have gobbled up a lot of the Deep Learning talent out there. These companies have very little interest in working with a small business to identify their specific needs and opportunities. However, fortunately these big companies have been gracious enough to allow their researchers to publish their work. We therefore do have a view into their latest developments and thus are able to take what they’ve learned and apply it to your context. There are companies like Intuition Machine that do have an onboarding process for you to get a competitive head start in DL technologies.
Please reach out Intuition Machine use to learn how to use Deep Learning in your business.