Computer vision is the field of computer science which mainly focuses on parts of complexity of the human vision system and computers are able to identify and process objects in images and videos as humans do. With computer vision, our computer can understand useful information from an individual image or a sequence of images.
Computer vision is a field of Artificial Intelligence that works in enabling computers to see, identify and process images in the same way that human vision does, and then provide the appropriate output.
Initially computer vision only worked in limited capacity but due to advance innovations in Deep Learning and Neural Network is the field that has been able to take great leaps in recent years and succeeded to surpass humans in some tasks like detecting and labeling objects.
Contribution of Deep Learning in the field of Computer Vision
There are significant obstacles in the path of computer vision, Deep Learning systems have made progress in dealing with some of the relevant sub-tasks. The reason for this to happen is based on the additional responsibilities assigned to deep learning systems.
It is reasonable to say that the biggest difference in deep learning systems is that they no longer need to be programmed. Rather than searching for specific features, the neural networks inside deep learning systems trained already. With the increased computational power offered by modern-day deep learning systems.
there is steady progress towards the point where computers will be able to recognize and perform tasks accordingly.
Application of Computer Vision
The field of Computer Vision is too lengthy to cover it in depth. The techniques of computer vision can help a computer to extract, analyze, and understand various information from one or a sequence of images.
There are many advanced techniques like style transfer, colorization, 3D objects, human pose estimation. but in this article we will only focus on the commonly used techniques of computer vision.
These techniques are: –
- Image Classification
- Image Classification with Localization
- Object Segmentation
- Object Detection
So in this article we will go through all the above mentioned techniques of computer vision. we will see how deep learning – used in the various techniques of computer vision in detail. To avoid confusion we will distribute this article into a series of multiple blogs. In the first blog we will talk about the first technique of computer vision which is Image Classification and we will also see how deep learning is useful in Image Classification.
Image Classification | Computer Vision
the process of predicting classes, or labels, for something defined by a set of points – Image Classification. Image classification – a part of the classification problem, where an entire image assigned as a label. Perhaps a picture will be classified as a daytime or nighttime image. Similarly we can classify images of cars and motorcycles and place them accordingly.
There are many categories, or classes, in which a specific image can be classified. Consider a process where images compared and similar ones – grouped according to characteristics, but without knowing in advance what you are actually looking for.
Obviously, this is a difficult task. To make it even more , assume that the set of images in the hundreds of thousands.
There are many image classification tasks among those two popular tasks. These are CIFAR-10 and CIFAR-100. they have photographs which have to be classified into 10 and 100 photographs respectively.
Deep learning for Image Classification
The deep learning architecture includes convolutional layers, making it a convolutional neural network (CNN). The typical use case of CNN is where you feed the network images and network classifies the data.
Convolutional Neural Networks start with a “scanner” as an input-not intended to parse all the training data at once. For example, if we want to input an image of 100 x 100 pixels, but we don’t want a layer of 10,000 nodes.
Instead, you can create a scanning input layer of 10 by10 which feeds the first 10 by 10 pixels of the image. Once you pass the input, you will feed it the next 10 by 10 pixels by moving the scanner one pixel to the right.
This technique is called sliding windows.
Layers used to build Convolutional Neural Networks:
- Input will hold the raw pixel values of the image. In this case an image of width 32, height 32, and with three color channels Red, Green and Blue is used.
- The Convolutional layer will compute the outputs that are connected to local regions in the input.
- the dot product is calculated between their weights and a small region.
- they are connected to in the input volume. This may result in volume such as 32 by 32 by 12 if we decided to use 12 filters.
- The RELU layer will apply element wise activation function, thresholding at zero. Due to this the volume remains unchanged.
- POOL layer will perform a Down sampling operation with the spatial dimensions such as width, height resulting in volume.
Output of the Model History
In this way, ConvNets transforms original images values to the final class scores. We have to keep in mind that some layers contain parameters and others don’t. In particular, the Convolutional layers perform transformations that are function of both the activations in the input volume & also the parameters (the weights and biases of the neurons).
Whereas, the RELU/POOL activation layers implement a fixed function.
Conclusion
The above content focuses on image classification and the architecture of deep learning used for it. But there is more to explore in computer vision than just classification tasks. The detection, segmentation and localization tasks of classified objects are equally important.
Article By: Amit Shukla
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs