Object Detection Using YOLO - Pianalytix - Build Real-World Tech Projects

YOLO – “You Only Look Once” are a series of end-to-end deep learning models designed for fast object detection, developed by Joseph Redmon, et al. in 2015. Once the complexity of the image increases, it is not possible to have computational resources to build a Deep Learning model from scratch.

So, predefined frameworks and pertained models come in handy. though, One such framework for object detection using YOLO. It’s a supremely fast, state-of-the-art, and accurate framework. thus, YOLO is implement on Darknet.

YOLO Algorithm

however, YOLO architecture is based on CNN and it can be customize according to the user’s requirement.

Step 1: Read the input image

Let, C= number of classes. In the above example, C= 3 and the class label are C1=Chair, C2=laptop, C3 = Car

Step 2: Divide the image into MxM cells

For each grid cell Xij→Y, a label Y is calculate by. The label Y is an N-dimensional vector, where N depends on the number of classes.

The above diagram shows the vector representation of label Y

Step 3: Apply image classification and localization for each grid and predict the bounding box

The (x, y) coordinates however represent the centre of the Bounding box relative to the grid cell location and (w,h) – dimension of the Bounding box. Both are normalized between [0-1].
IoU applies to object detection. Intersection Over Union-IoU is an evaluation metric in use to measure the accuracy of an object detector on a dataset.

Step 4: Predict the class probabilities of object

Class probabilities are predicted as P ClassObject. This probability is condition on the grid cell containing one object.

The vector Y for the first grid looks like this

The output of this step results in 3x3x8 values i.e., for each grid 8-dimensional vector will get compute.
In a real-time scenario, the number of grids can be a large number like 13×13, and accordingly, the Y vector varies.

Step 5: Train the CNN

The last step is training the Convolutional Neural Network. The normal architecture of CNN employs with convolutional layer and max pooling.

What is Darknet?

Darknet is an open-source framework that supports Object Detection and Image Classification tasks in the form of Convolutional Neural Networks.
It is written in C/CUDA
It is used as the framework for training YOLO, i.e., it sets the architecture of the network
Darknet is mainly used to implement the YOLO algorithm
The darknet is the executable code. This executable code can directly perform object detection in an image, video, camera, and network video stream.

Advantages over other decoders

Rather than using a two-step method for classification and localization of objects, YOLO applies a single CNN for both classification and localization of the object.
YOLO can process images at about 40-90 FPS, so it is quite fast. This means streaming video can be processed in real-time, with negligible latency in a few milliseconds. The architecture of YOLO makes it extremely fast. When compared with R-CNN, it is 1000 times faster and 100 times faster than Fast R-CNN.

written by: Ganesh Hari

reviewed by: Savya Sachi

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs