Image Classification

Convolutional Neural Networks

The input of the ConvNN passes through a series of filters in order to get predicted output. In the ConvNN should be at least one convolutional layer. This allows to process more complex images and videos than a regular neural nets. Last decade the use of ConvNN has grown greatly, as an example for this serves the Facebook’s automatic tagging system and ImageNet.


The architecture of ConvNN is similar to a human brain neural nets connectivity.

The ConvNN is the luck midpoint between the biological nets and simple perceptrons.

Convolutional Neural Network is a Deep Learning algorithm. Its average accuracy is much higher than other simple nets. The key technology is Deep Learning.

The main reason for the success of the ConvNN was the concept of general weights. Despite their large size, these networks have a small number of configurable parameters compared to their ancestor. There are versions of the ConvNN (Tiled Convolutional Neural Network), similar to recognition, in such networks, there is a partial rejection of the associated weights, but the learning algorithm remains the same and is based on the back-propagation of the error. ConvNN can quickly work on a serial machine and quickly learn due to the pure parallelization of the convolution process on each card, as well as reverse convolution when propagating the error over the network.

The ConvNN

Let’s imagine an image of size 5x5x1 (Height) x (Width) x (Channel). In the image shown below, the green is represented as an input image,I. in the Convolutional Layer, the first operation

that carried out is the Kernel filter, Filter K, shown as a yellow color with a size of 3x3x1.

Kernel shifts for one cell per time,(Stride=1), so that every time it shifts, it performs matrix multiplication between K and the portion of the image, over which is kernel is placed. The Kernel repeats the matrix multiplication until the image is traversed.

In this instance, the kernel size is the same as the input image size. The multiplication operation is made over K and I and all the results are summed with bias to output the squashed Convoluted Feature Output.

The aim of ConvNN is to extract the features from the input and with more additional layer the architecture adapts to High-Level features. There are two kinds of output to the operation- one of which is reducing in dimension compared to initial size, the second is either increased or remains as it was. The second is by Same-padding.

Same Padding

The matrix with the dimension of 5x5x1 is increased to the size of 6x6x1, and if it will be applied by kernel filter, the remaining result will be the size of 5x5x1, so the name is Same-Padding.

The Pooling layer

The aim of this layer is to reduce the dimension of the previous layer. If on the previous layers the detailed features were obtained, then for further operations these too detailed features are not required and it is compressed to a less detailed one.

There are two types of pooling: Maximum Pool and Average Pooling. The Max Pooling returns the maximum pixel value over which the filter hovers, but the Average Pooling returns the average value of the area which is covered by the filter. It is proven that Max Pooling is more preferably

over Average Pooling.

Max and Average Pooling

Fully Connected Neural Networks

The fully connected neural nets is the last type of layer in the multilayer perceptron. The goal of the layer is — classification. The layer simulates complex non-linear function, that by optimizing the function, the accuracy of the predictions can be improved. Over several epochs of training the model, it gets better in classification/prediction using backpropagation