Logos sometimes also known as trademark have high importance in today’s marketing world. Products, companies and different gaming leagues are often recognized by their respective logos. Logo recognition in images and videos is the key problem in a wide range of applications, such as copyright infringement detection, vehicle logo for intelligent traffic-control systems, augmented reality, contextual advertise placement and others. In this post we will look on how to use SSD from Tensorflow API to detect as well as localize brand logos in images of a T.V. show(Big Boss India). The task is to detect and localize six brand logos: fizz, oppo, samsung, garnier, faber, cpplus from images of the show.
What is SSD and how it works?
According to the paper on SSD, SSD: Single Shot Multibox Detector is a method for detecting objects in images using a single deep neural network. SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference.
The tasks of object detection and localization and classification are done in a single forward pass of the network.
Architecture of SSD
Logo Detection Dataset
Data for this task was obtained by capturing individual frames from a video clip of the show. A total of 6267 images were captured. I used 600 images for Test and the rest for the Training part.
Now, the next step was to annotate the images obtained. To do that, I used LabelImg. LabelImg is a graphical image annotation tool. The annotations produced are saved as XML files in PASCAL VOC format. The installation instructions are given in the repository.
Similarly, I had to go through all the images of the dataset and annotate them individually.
What is a TFRecord?
If we are working with large datasets, using a binary file format for storage of data can have a significant impact on the performance of our import pipeline and as a consequence on the training time of our model. Binary data takes up less space on disk, takes less time to copy and can be read much more efficiently from disk. This is where a TFRecord comes up. However, pure performance isn’t the only advantage of the TFRecord file format. It is optimized for use with Tensorflow in multiple ways. To start with, it makes it easy to combine multiple datasets and integrates seamlessly with the data import and preprocessing functionality provided by the library. Especially for datasets that are too large to be stored fully in memory this is an advantage as only the data that is required at the time (e.g. a batch) is loaded from disk and then processed. Since, we are going to work with the Tensorflow API, we will be converting our XML files to TFRecords.
Converting XML to TFRecord
To convert XML files to TFRecord, we will first convert them to CSV using a python script, thanks to this repository. There are some minor changes that need to be introduced. Here is the code that converts your XML files to CSV files.