AlexNet: ImageNet Classification With Deep CNN

A Brief Introduction to CNNs

Before we understand what AlexNet is and how it works, we need to have an idea about Convolutional Neural Networks(CNNs), as this network is based on this algorithm. Convolutional Neural Networks are a type of Neural Networks that are specifically designed to perform computer vision tasks.

Their application includes classification, image reconstruction, segmentation, even natural language processing, and timed series forecasting(ARIMA). They are different from conventional Neural Networks(Artificial Neural Networks/ANNs). ANNs are not use for Computer vision as the parameters that need to be learn in the case of images would be very high for high spatial resolution images.

These networks can automatically extract features from images(eg. Tail is a feature for a dog’s image). The images in CNNs pass through a series of Convolution and Pooling layers. In the end, they pass through a Fully Connected Layer with ‘relu’ and ‘sigmoid’ activations.

Architecture of CNNs

Convolutions are complete with filters/kernels of odd size(3×3, 5×5, … ). These filters are convolved throughout the image which is the sum of element-wise multiplication of filter weights and image pixels and replaced for the center pixel. A 10×10 (10x10x3, if RGB) image when convolved through a 3X3 filter, will give you an 8×8 image assuming the stride is 1.

Stride refers to the number of pixels the filter shifts after each convolution is complete. Padding is done to avoid loss of information as edge pixels will be lost after this operation. thus, Padding is the concept of adding rows/columns of zeros at the edges of an image. For any layer with ‘fxf’ filter size, ‘p’ padding and ‘s’ stride dimensions of the layer is given by:

n[l]=n[l-1]+2p-fs +1

Pooling is use to reduce the size of the image. Here, the image is divide up into sub-images of dimension specifies in this layer, and the maximum of pixels of each sub-image is replace for the sub-image, also refer as Max-Pooling. If the previous(convolved) layer is of size 32×32, after a MaxPooling layer of 2×2, the output image will be of size 16×16.

Convolutions are follow by Max-Pooling layers. After one or two convolutions there exists a pooling layer. Finally, after a few of these combined layers, it is passed through a Fully-Connected layer that learns specific features instead of the whole image.

AlexNet

AlexNet named after Alex Krizhevsky is a modified version of ImageNet which train on 15 million images belonging to 20,000 categories. so, This uses a subset of ImageNet with roughly 1000 images in each of 1000 categories. thus, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing images.

so, The images were down-sample to a fix resolution of 256 × 256. The image resizes first to make the shorter side of length 256 and then the central 256×256 patch crop out from the resulting image. however, No other preprocessing was applies, except for subtracting the mean activity over the training set.

Architecture of AlexNet

It contains eight consecutive layers, five of them are convolutional and the other three are fully-connect up. Given below are some of the key features of AlexNet.

1. ‘ReLU’ activations

Instead of using conventional ‘tanh’ or ‘sigmoid’ non-linearities, It uses a different function f(x)=max(0, x), which is much faster with gradient descent. This is because ‘tanh’ and ‘sigmoid’ non-linearities take a long time to converge. Neurons with such nonlinearity, as called Rectified Linear Units (ReLUs).

Hence, these networks with ReLUs train several times faster(almost times) than their equivalents with tanh units. so, Using ReLU activations makes these networks almost 6 times faster.

2. Training on Multiple GPUs

These networks were trained on multiple GPUs(GTX 580s) each of 3 GB of memory. 1.2 million images were large enough to fit. So the training examples spread across two GTX 580s. so, The parallelization scheme used here puts half of the kernels (or neurons) on each GPU. thus, The GPUs communicate only in certain layers.

This means the kernels of layer 3 take input from all kernel maps in layer 2. However, kernels in layer 4 take input only from those kernel maps in layer 3 which is on the same GPU. This allowed them to tune until the amount of communication became an acceptable fraction of the amount of computation.

3. Local Response Normalization

After usingReLUs the network doesn’t not require input normalization to prevent them from saturating.Even if a few training examples produce a positive input to ReLU, the neurons will still keep learning.

4. Overlapping Pooling

Traditionally, CNNs also have adjacent pooling layers that do not overlap. Here they have used pooling layers that overlap.

Image: Architecture of a Convolutional Neural Network

Solving the problem of Overfitting

Data Augmentation:-

The process of creating new data based on the available data. The new data is generally cropped, rotated, inverted, tilted, scaled, or any other processed form of available data. hence, This create more data leading to the solving of overfitting. Here we used two such operations:-

Image translations and Horizontal Reflections.
Change in Intensity of RGB image.

Dropout Regularization:-

Dropout Regularization refers to the dropping of a few neurons from a hidden layer on a neural network. This can significantly reduce overfitting. however, There is a certain probability for each hidden layer according to which the neurons are dropp. so, Here the probability is set to 0.5. thus, This applies to the first two layers of the Fully Connect layer of the network.

Image: Showing Dropout Regularization

Conclusion

Convolutional Neural Networks are some of the most used algorithms of Deep Learning. AlexNet is use in a lot of real-world scenarios because of its very high accuracy. Also, Transfer Learning works great with these networks. These networks are so popular that they are include up in programs in MATLAB.

References

Most of the data and information I- have collected from “ImageNet Classification with Deep Convolutional Neural Networks” by Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.

Written By: Soumya Ranjan Acharya

Reviewed By: Rushikesh Lavate

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs