Logos sometimes also known as trademark have high importance in today’s marketing world. Products, companies and different gaming leagues are often recognized by their respective logos. Logo recognition in images and videos is the key problem in a wide range of applications, such as copyright infringement detection, vehicle logo for intelligent traffic-control systems, augmented reality, contextual advertise placement and others. In this post we will look on how to use SSD from Tensorflow API to detect as well as localize brand logos in images of a T.V. show(Big Boss India). The task is to detect and localize six brand logos: fizz, oppo, samsung, garnier, faber, cpplus from images of the show.

What is SSD and how it works?

According to the paper on SSD, SSD: Single Shot Multibox Detector is a method for detecting objects in images using a single deep neural network. SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference.

The tasks of object detection and localization and classification are done in a single forward pass of the network.

Multi box concept in SSD(Source: Udemy A-Z Computer Vision)

Architecture of SSD


Logo Detection Dataset

Data for this task was obtained by capturing individual frames from a video clip of the show. A total of 6267 images were captured. I used 600 images for Test and the rest for the Training part. 
Now, the next step was to annotate the images obtained. To do that, I used LabelImg. LabelImg is a graphical image annotation tool. The annotations produced are saved as XML files in PASCAL VOC format. The installation instructions are given in the repository.

Annotating Image using labelImg

Similarly, I had to go through all the images of the dataset and annotate them individually.


What is a TFRecord?

If we are working with large datasets, using a binary file format for storage of data can have a significant impact on the performance of our import pipeline and as a consequence on the training time of our model. Binary data takes up less space on disk, takes less time to copy and can be read much more efficiently from disk. This is where a TFRecord comes up. However, pure performance isn’t the only advantage of the TFRecord file format. It is optimized for use with Tensorflow in multiple ways. To start with, it makes it easy to combine multiple datasets and integrates seamlessly with the data import and preprocessing functionality provided by the library. Especially for datasets that are too large to be stored fully in memory this is an advantage as only the data that is required at the time (e.g. a batch) is loaded from disk and then processed. Since, we are going to work with the Tensorflow API, we will be converting our XML files to TFRecords.


Converting XML to TFRecord

To convert XML files to TFRecord, we will first convert them to CSV using a python script, thanks to this repository. There are some minor changes that need to be introduced. Here is the code that converts your XML files to CSV files.

The XML files stored in ‘images/train’ and ‘images/test’ are converted to two CSV files, one for train and one for test which are generated in the folder ‘data’.(Note these details, if you want to train your SSD model on a custom dataset)

Once, the XML files have been converted to CSV files, we can then output the TFRecords using a python script from the same repository with some changes.

For training on your custom dataset, change the class names in the class_text_to_int function. Also, make sure you follow the installation instructions over here to install the dependencies to run the above code. Clone the tensorflow repository as well. It will be helpful later on.

Once you are done with the installations, we are all set to generate our tfrecords using the code snippet above. Type this in your terminal to generate the tfrecord for the training data.

python generate_tfrecord.py — csv_input=data/train_labels.csv — output_path=data/train.record

Ahh finally!! we have our train.record. Similarly do this for the test data.

python generate_tfrecord.py — csv_input=data/test_labels.csv — output_path=data/test.record

Training Brand Logos Detector

To get our brand logos detector we can either use a pre-trained model and then use transfer learning to learn a new object, or we could learn new objects entirely from scratch. The benefit of transfer learning is that training can be much quicker, and the required data that you might need is much less. For this reason, we’re going to be doing transfer learning here. TensorFlow has quite a few pre-trained models with checkpoint files available, along with configuration files.

For this task I have used Inception. You can always use some other model. You can get a list of models and their download links from here. To get the configuration file of your corresponding model click here. Now, that we are done with downloading our model and our configuration file, we need to edit the configuration file according to our dataset.

In your configuration file search for “PATH_TO_BE_CONFIGURED” and change it to something similar to what has been shown in the code snippet above. Also, change the number of classes in the config file.

One last thing that still remains before we can start training is creating the label map. Label map is basically a dictionary which contains the id and name of the classes that we want to detect.

detection.pbtxt

That’s it. That’s it. We can now start training. Ahh finally! 
Copy all your data to the cloned tensorflow repository on your system(Clone it if you haven’t earlier). And from within ‘models/object_detection’ type this command in your terminal.

python3 train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_inception_v2_coco_2017_11_17.config

You can wait until the total loss reaches around 1.


Testing Brand Logos Detector

To test how good our model is doing, we need to export the inference graph. In the ‘models/object_detection’ directory, there is a script that does this for us: ‘export_inference_graph.py
To run this, you just need to pass in your checkpoint and your pipeline config. Your checkpoint files should be in the ‘training’ directory. Just look for the one with the largest step (the largest number after the dash), and that’s the one you want to use. Next, make sure the pipeline_config_path is set to whatever config file you chose, and then finally choose the name for the output directory. For eg:

python3 export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path training/ssd_inception_v2_coco_2017_11_17.config \
--trained_checkpoint_prefix training/model.ckpt-7051 \
--output_directory logos_inference_graph

Once this runs successfully you should have a new directory with the name ‘logos_inference_graph’. After this open ‘object_detection_tutorial.ipynb’ and change the ‘MODEL_NAME’ to ‘logos_inference_graph’ and change the number of classes in the Variables section. Next, we can just delete the entire Download Model section in the notebook, since we don’t need to download our model anymore. In the ‘Test_Images_Path’ you can enter the directory where your test images have been stored.


Results

Here are some of my results:

Fizz logo getting detected
Oppo and Cpplus logos getting detected
Fizz logo detected

Summary

To detect logos in images, these are the procedures that I followed:

  1. Obtaining Dataset
  2. Creating XML files using LabelImg
  3. Converting XML files to TFRecords
  4. Downloading the model and editing the corresponding configuration file
  5. Creating a label map
  6. Start Training

Source: Deep Learning on Medium