This paper was presented two years ago, another approach to learn higher level of structure by leveraging unsupervised learning algorithm. (k means)

Please note that this post is for my future self to look back and review the materials on this paper without reading it all over again.

**Abstract**

The task of labeling the given data is a labor intensive process, and in this work the authors decided to leverage unsupervised learning algorithm to relax this constraints. In which they trained a deep convolutional neural network based on an enhanced version of the k-means clustering algorithm, which reduces the number of correlated parameters in the form of similar filters, and thus increases test categorization accuracy. And they were able to get some good accuracy on STL and MNIST data set.

**Introduction**

Large amount of data is needed in-order for a network to perform well, however labeling data is a labor intensive process. Hence some method is needed to leverage the unlabeled data. ( Previously unsupervised pre-training was needed to train a very deep neural network.). In this work the authors propose a refined version of an unsupervised clustering algorithm that allows the filters to learn diverse features. ( Sparse connection and prevents the algorithm from learning redundant filters that are basically shifted version of each others.)

**Related Work**

While different unsupervised learning methods exist to learn the filters that are going to to be used, but they ignore fact that filters would be used in a convolutional manner. And this can results in the filters learning duplicated feature detectors. To overcome this, convolutional Restricted Boltzmann Machines with contrastive divergence and convolutional sparse codings were proposed. Filters using k-means algorithm have gained attention recently, however, there were little attention in reducing the redundancy in the learned filters. ( Small work such as random connection or grouping similar features have been proposed however they did not provide significant improvements.) In this work the authors solve this problem by devising an optimized learning algorithm that avoids replication of similar filters.

**Learning Filters**

K means algorithm learns a dictionary D (N*K) from a data vector w (N*M), by using the above algorithm. (s is the associated code vector with the input w, D(j) is the j’th column of the dictionary D, and the matrix W have dimension of (N*M) while S have dimension of (K*M). However, with the k means learning algorithm there will be redundancy among the learned filters, hence the authors proposed convolutional k means.

Convolution K means learning algorithm chooses much bigger image patch to perform clustering, to be specific, the windows are chosen to be two times bigger than the filter size and randomly selected from the input images. Then the centroids convolve the entire window to compute a similarity metric at each location of the extracted area, and the area in which have the biggest activation value is meant to be the most similar feature to the centroid. This gets extracted from the window and assigned to the corresponding centroid. (The modified algorithm can be seen below.)

And as seen below, when we compare the filters learned by the two algorithm we can observe a redundancy among the filters learned via k means algorithm while the learned convolutional k means algorithm learns more essential basis.

To compare the performance, the authors uses STL data set, and for the learning of filters they used only unlabeled images. (For the encoding scheme they applied global contrast normalization to the input images.)

As seen above, when the authors compared the performance of the network that have the filters learned via k-means to convolutional k-means. We can see that the network learned via the convolutional k-means have a better performance. This indicates that the filters do a much better job at extracting useful information from a given image.

**Learning Connections**

The authors also studies a method to learn the connections among each layers. While fully connected architectures fully uses all of the features from the previous layers, the non-complete connection are more efficient in computation. (The authors used a sparse connection matrix to achieve this.). By limiting the connection among the receptive field it is possible to scale the algorithm.

Using the architecture as seen above, the authors first added a convolutional layer with a predefined non-complete connection and attach a linear classifier after the convolutional layer and train the system using a back-propagation algorithm. And in this phase the connection matrix is learned via supervised learning, when the training is done only the connection matrix is kept and convolutional filters are learned via convolutional k-means algorithms.

As seen above, when both of the convolutional layers are leaned via convolutional k means algorithm, it produces the best result.

Additionally, we can again confirm how effective convolutional k means algorithm is. When the network is trained via back propagation, it overfits to the training data hence have a very low performance on the test image set.

**Final Classification Results**

Finally, the authors have compare their method to other state of the art algorithms. (For STL and MNIST Data set).

As seen above, when we compare the unsupervised learned filter methods, we can see the authors method dramatically have increased the a test accuracy. (Note for the above algorithm the authors used multi-dictionary approach rather than having a single dictionary.).

Even for the MNIST data set we can see that the authors method have achieved the lowest error rates.

**Conclusion**

In conclusion the authors of this paper have proposed a learning algorithm in which combines the strength of both supervised and unsupervised learning algorithms. The authors have modified the k means learning algorithm in which reduces the redundant filters learned by the network. Additional the authors proposes a learning setup to learn the proper connections between layers. And this method is more simple and performed better than other unsupervised learning methods.

**Final Words**

One very good advantage of this method is the fact that we do not need to whiten the data before performing k-means clustering. However, from deco-related batch normalization we already know that it is certainly possible to create a whitening layer.

**Reference**

- Dundar, A., Jin, J., & Culurciello, E. (2015). Convolutional Clustering for Unsupervised Learning. Arxiv.org. Retrieved 20 August 2018, from https://arxiv.org/abs/1511.06241