This article is a quick tutorial on what is object detection and how to make a real-time object detection using OpenCV and YOLO(You Only Look Once)

What do you mean by Object Detection?

Object Detection means to detect an instance of objects within an image or a set of images. It is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, cars, balls, pen, and many more) in digital images and videos. Well-researched domains of object detection include facial detection.

Where is Object Detection useful?

object detection has many areas of application be it facial recognition, image retrieval, video surveillance, and pedestrian detection. The best example is tracking a ball in a football match.

Concept of Object Detection

Every object class has its own special features that help in classifying the class of an image be it the shape of an object or its texture. For example, a basketball is differentiated from a football bay its colour, texture, and other such features. A similar approach is used for object detection where features like the shape of the object, its colour, and others are taken into consideration.

Object detection VS Object Recognition

Object detection and object recognition are similar techniques for identifying objects, but they vary in their execution. Object detection is the process of finding instances of objects in images. In the case of deep learning, object detection is a subset of object recognition, where the object is not only identified but also located in an image. This allows for multiple objects to be identified and located within the same image.

Object recognition Here the Program was able to identify that the given input was a fruit Strawberry with high accuracy.
Object recognition

Here the Program was able to identify that the given input was a fruit Strawberry with high accuracy.

accuracy of 99%

What is OpenCV

OpenCV (Open source computer vision) is a library of programming functions aimed at real-time computer vision originally developed by intel.

It is a cross-platform library and free for use. It supports deep learning frameworks like Yolo, Tensorflow, Py-Torch and many more.

OpenCV is written in C++ and its primary interface is C++.

OpenCV runs on operating systems like Windows, Linux,macOS, FreeBSD, NetBSD, OpenBSD and on mobile operating systems like Android, IOS, Maemo and Blackberry 10.

Yolo comes preinstalled in OpenCV

How to use YOLO with OpenCV

Let’s start,

We will focus in this tutorial on how to use YOLO with Opencv. This is the best approach for beginners, to get quickly the algorithm working without doing complex installations.

  1. first, we Import libraries OpenCV and numpy.
importing libraries

2. We load the algorithm. The run the algorithm we need three files:

  • Weight file: it’s the trained model, the core of the algorithm to detect the objects.
  • Cfg file: it’s the configuration file, where there are all the settings of the algorithm.
  • Name files: contains the name of the objects that the algorithm can detect.
using pre-trained models

We then load the webcam or a preexisting video where we want to perform the object detection and we also get its width and height.

loading existing video or webcam
to load a webcam just write 0
reading height and width

3. Now that we have the algorithm ready to work and also the image, it’s time to pass the image into the network and do the detection. Keep in mind that we can’t use right away the full image on the network, but first, we need it to convert it to a blob. Blob it’s used to extract feature from the image and to resize them. YOLO accepts three sizes:

  • 320×320 it’s small so less accuracy but better speed
  • 609×609 it’s bigger so high accuracy and slow speed
  • 416×416 it’s in the middle and you get a bit of both.

Out is an array that contains all the information's about objects detected, their position and the confidence about the detection.

detection zone

4. At this point, the detection is done, and we only need to show the result on the screen.
We then loop through the outs array, we calculate the confidence and we choose a confidence threshold. The threshold goes from 0 to 1. The closer to 1 the greater is the accuracy of the detection, while the closer to 0 the less is the accuracy but also it’s greater the number of the objects detected.

5. at this point the detection is done, and we only need to show the result on the screen.
We then loop through the outs array, we calculate the confidence and we choose a confidence threshold. The threshold goes from 0 to 1. The closer to 1 the greater is the accuracy of the detection, while the closer to 0 the less is the accuracy but also it’s greater the number of the objects detected.

6. When we perform the detection, it happens that we have more boxes for the same object, so we should use another function to remove this “noise”.
It’s called Nonmaximum suppression.

7. We finally extract all the information and show them on the screen.

  • Box: contain the coordinates of the rectangle surrounding the object detected.
  • Label: it’s the name of the object detected
  • Confidence: the confidence about the detection from 0 to 1.

Reference Code

The reference code and other files can be found at this Git Respority

shantamsultania/object_detection_using_yolo_in_video_and_webcam

Written by

Shantam Sultania - CHANDIGARH UNIVERSITY - Chandigarh, Chandigarh, India | LinkedIn