Intro
Artificial intelligence is rapidly moving from the laboratory and into business and consumer applications. The result is a fundamental shift in how software is built, and what it's capable of doing. And while we're still a way off from the artificial general intelligence portrayed in the movies, artificial narrow intelligence is a reality that's already powering some of the most successful technology businesses today, including Amazon, Facebook, Google and Apple.
The Architect of Artificial Intelligence--Deep Learning
Artificial Intelligence has been one the most remarkable advancements of the decade. People are hushing from explicit software development to building Ai based models, businesses are now relying on data driven decisions rather on someone manually defining rules. Everything is turning into Ai, ranging from Ai chat-bots to self driving cars, speech recognition to language translation, robotics to medicine. Ai is not a new thing to researchers though. It has been present even before 90’s. But what’s making it so trending and open to the world??
In this introduction, we won’t dive into technical details of Deep Learning and Neural Nets , but share with you why I think Deep Learning is taking over other traditional methods. If you are not into Deep Learning and Ai stuffs, let me explain it to you in simple non-techie words. Imagine you have to build a method to classify emails into categories like social, promotional or spam, one of the prime Ai tasks that Google does for your Gmail inbox! What would you do to achieve this? May be you could make a list of words to look for into emails like ‘advertisement’, ‘subscribe’, ‘newsletter’ etc, then write a simple string matching regex to look for these words in the emails and classify them as promotional or spam if these words are found. But the problem here is how many keywords can you catch this way or how many rules can you manually write for this? You know the content over internet is cross folding and each day, new keywords would hop in. Thus, this keywords based approach won’t land you good results.
Now if you would give a closer thought to this, you have a computer which can do keyword matching million times faster than you. So rather than using your computationally powerful device just for simple string matching, why not let the computer decide the rules for classification too! What I mean is a computer can go through thousands of data and come up with more precise rules for the task in the time you could just think of 5 such rules.
This is deep learning all about! Instead of you explicitly designing rules and conditions which you think would solve the problem (like simple if-else, making dictionaries of keywords etc.), Deep Learning deals with giving computer the capability to produce certain rules which it can use to solve the problem. This means it’s an end-to-end architecture. You give in the data as input to the network and tell the desired output for each data point. The network then goes through the data and update the rules accordingly to land on a set of optimized rules.
This decision making ability is generally limited to we humans, right?
This is where Artificial Neural Networks (or simply neural nets) kick in. These are set of nodes arranged in layers and connected through weights (which are nothing but number matrices) in a similar way as neurons are connected in our brain. Again I won’t go into technical details of the architecture, their learning algorithms and mathematics behind it, but this is the way Deep Learning mimics brain’s learning process.
Lets take another example, suppose you are to recognize human face in an image which could be located anywhere in the image. How would you proceed?
One obvious way is to define a set of key-points all over human face which together can characterize the face. Generally these are in sets of 128 or 68. These points when interconnected forms an image mask. But what if the orientation of face changes from frontal view to side view?? The geometry of face which helped these points to identify a face changes and thus, the key-point method won’t detect the face.
68 typical key points of human face
Deep Learning makes this possible too ! The key-points we used were based on a human’s perception of face features(like nose, ears, eyes). Hence to detect a face, we try to make the computer find these features together in an image. But guess what, these manually selected features are not so pronounced to computers. Deep Learning rather makes the computer go through a lot of faces (containing all sort of distortions and orientations) and lets the computer decide what feature maps seems relevant to the computer for face detection. After all the computer has to recognize the face, not you! And this gives surprisingly good results. You can go through one of my project here where I used ConvNets (a deep learning architecture) to recognize expression of the face.Having large data set of faces for recognizing a face may occur as a problem to you. But one-shot learning methods such as Siamese Network have solved this problem too. It is an approach based on a special loss function called Contrastive Triplet Loss and was introduced in the FaceNet paper. I won’t discuss about this here. If you wish to know abut it, you can go through the paper here.
Siamese Network for Gender Detection, Image taken from www.semanticscholar.org
Another myth about Deep Learning is that Deep Learning is a Black Box. There’s no feature engineering and Maths involved behind the architecture. And so it simply replicates the data without actually providing a reliable and long-term solution to the problem.NO, it’s not like ! It has mathematics and probability involved in a similar way traditional Machine Learning methods have, be it simple Linear Regression or Support Vector Machines. Deep Learning uses the same Gradient Descent equation to look for optimized parameter values as Linear Regression does. The cost function, the hypothesis, error calculation from target value (loss) are all done in similar fashion as they are in traditional algorithms (based on equations). Activation functions in deep nets are nothing but mathematical functions. Once you understand every mathematical aspect of Deep Learning, you can figure out how to build the model for a specific task and what changes need to be done. It’s just that the mathematics involved in Deep Learning turns out to be little complex. But if you get the concepts right, it’s no more a Black Box to you! In fact this is true to all the algorithms in the world.As far as I’ve learnt, I’ve made my way through all the mathematics behind it. Beginning right from a simple perceptron, standard Wx+b equation of a neuron and back-propagation to modern architectures such as CNN, LSTM, Encoder-Decoder, Sequence2Sequence etc.
Deep Learning vs Classical Machine Learning
Over the past several years, deep learning has become the go-to technique for most AI type problems, overshadowing classical machine learning. The clear reason for this is that deep learning has repeatedly demonstrated its superior performance on a wide variety of tasks including speech, natural language, vision, and playing games. Yet although deep learning has such high performance, there are still a few advantages to using classical machine learning and a number of specific situations where you’d be much better off using something like a linear regression or decision tree rather than a big deep network.
In this post we’re going to compare and contrast deep learning vs classical machine learning techniques. In doing so we’ll identify the pros and cons of both techniques and where/how they are best used.
Deep Learning > Classical Machine Learning
Best-in-class performance: Deep networks have achieved accuracies that are far beyond that of classical ML methods in many domains including speech, natural language, vision, and playing games. In many tasks, classical ML can’t even compete. For example, the graph below shows the image classification accuracy of different methods on the ImageNet dataset; blue colour indicates classical ML methods and red colour indicates a deep Convolutional Neural Network (CNN) method. Deep learning blows classical ML out of the water here.
Scales effectively with data: Deep networks scale much better with more data than classical ML algorithms. The graph below is a simple yet effective illustration of this. Often times, the best advice to improve accuracy with a deep network is just to use more data! With classical ML algorithms this quick and easy fix doesn’t work even nearly as well and more complex methods are often required to improve accuracy.
No need for feature engineering: Classical ML algorithms often require complex feature engineering. Usually, a deep dive exploratory data analysis is first performed on the dataset. A dimensionality reduction might then be done for easier processing. Finally, the best features must be carefully selected to pass over to the ML algorithm. There’s no need for this when using a deep network as one can just pass the data directly to the network and usually achieve good performance right off the bat. This totally eliminates the big and challenging feature engineering stage of the whole process.
Adaptable and transferable: Deep learning techniques can be adapted to different domains and applications far more easily than classical ML algorithms. Firstly, transfer learning has made it effective to use pre-trained deep networks for different applications within the same domain. For example, in computer vision, pre-trained image classification networks are often used as a feature extraction front-end to object detection and segmentation networks. The use of these pre-trained networks as front-ends eases the full model’s training and often helps achieve higher performance in a shorter period of time. In addition, the same underlying ideas and techniques of deep learning used in different domains are often quite transferable. For example, once one understands the underlying deep learning theory for the domain of speech recognition, then learning how to apply deep networks to natural language processing isn’t too challenging since the baseline knowledge is quite similar. With classical ML this isn’t the case at all as both domain specific and application specific ML techniques and feature engineering are required to build high-performance ML models. The knowledge base of classical ML for different domains and applications is quite different and often requires extensive specialized study within each individual area.
Classical Machine Learning > Deep Learning
Works better on small data: To achieve high performance, deep networks require extremely large datasets. The pre-trained networks mentioned before were trained on 1.2 million images. For many applications, such large datasets are not readily available and will be expensive and time consuming to acquire. For smaller datasets, classical ML algorithms often outperform deep networks.
Financially and computationally cheap: Deep networks require high-end GPUs to be trained in a reasonable amount of time with big data. These GPUs are very expensive yet without them training deep networks to high performance would not be practically feasible. To use such high-end GPUs effectively, a fast CPU, SSD storage, and fast and large RAM are all also required. Classical ML algorithms can be trained just fine with just a decent CPU, without requiring the best of the best hardware. Because they aren’t so computationally expensive, one can also iterate faster and try out many different techniques in a shorter period of time.
Easier to interpret: Due to the direct feature engineering involved in classical ML, these algorithms are quite easy to interpret and understand. In addition, tuning hyper-parameters and altering the model designs is more straightforward since we have a more thorough understanding of the data and underlying algorithms. On the other hand, deep networks are very “black box” in that even now researchers do not fully understand the “inside” of deep networks. Hyper-parameters and network design are also quite a challenge due to the lacking theoretical foundation.