YOLO – “You Only Look Once” are a series of end-to-end deep learning models designed for fast object detection, developed by Joseph Redmon, et al. in 2015. Once the complexity of the image increases, it is not possible to have computational resources to build a Deep Learning model from scratch.
So, predefined frameworks and pertained models come in handy. though, One such framework for object detection using YOLO. It’s a supremely fast, state-of-the-art, and accurate framework. thus, YOLO is implement on Darknet.
YOLO Algorithm
however, YOLO architecture is based on CNN and it can be customize according to the user’s requirement.
Step 1: Read the input image
Let, C= number of classes. In the above example, C= 3 and the class label are C1=Chair, C2=laptop, C3 = Car
Step 2: Divide the image into MxM cells
For each grid cell Xij→Y, a label Y is calculate by. The label Y is an N-dimensional vector, where N depends on the number of classes.
The above diagram shows the vector representation of label Y
Step 3: Apply image classification and localization for each grid and predict the bounding box
- The (x, y) coordinates however represent the centre of the Bounding box relative to the grid cell location and (w,h) – dimension of the Bounding box. Both are normalized between [0-1].
- IoU applies to object detection. Intersection Over Union-IoU is an evaluation metric in use to measure the accuracy of an object detector on a dataset.
Step 4: Predict the class probabilities of object
Class probabilities are predicted as P ClassObject. This probability is condition on the grid cell containing one object.
- The vector Y for the first grid looks like this
- The output of this step results in 3x3x8 values i.e., for each grid 8-dimensional vector will get compute.
- In a real-time scenario, the number of grids can be a large number like 13×13, and accordingly, the Y vector varies.
Step 5: Train the CNN
The last step is training the Convolutional Neural Network. The normal architecture of CNN employs with convolutional layer and max pooling.
What is Darknet?
- Darknet is an open-source framework that supports Object Detection and Image Classification tasks in the form of Convolutional Neural Networks.
- It is written in C/CUDA
- It is used as the framework for training YOLO, i.e., it sets the architecture of the network
- Darknet is mainly used to implement the YOLO algorithm
- The darknet is the executable code. This executable code can directly perform object detection in an image, video, camera, and network video stream.
Advantages over other decoders
- Rather than using a two-step method for classification and localization of objects, YOLO applies a single CNN for both classification and localization of the object.
- YOLO can process images at about 40-90 FPS, so it is quite fast. This means streaming video can be processed in real-time, with negligible latency in a few milliseconds. The architecture of YOLO makes it extremely fast. When compared with R-CNN, it is 1000 times faster and 100 times faster than Fast R-CNN.
written by: Ganesh Hari
reviewed by: Savya Sachi
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs