In this tutorial you will learn how to build a “people counter” with OpenCV and Python. Using OpenCV, we’ll count the number of people who are heading “in” or “out” of a department store in real-time.

Building a person counter with OpenCV has been one of the most-requested topics here on the PyImageSearch and I’ve been meaning to do a blog post on people counting for a year now — I’m incredibly thrilled to be publishing it and sharing it with you today.

Enjoy the tutorial and let me know what you think in the comments section at the bottom of the post!

To get started building a people counter with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

OpenCV People Counter with Python

In the first part of today’s blog post, we’ll be discussing the required Python packages you’ll need to build our people counter.

From there I’ll provide a brief discussion on the difference between object detection and object tracking, along with how we can leverage both to create a more accurate people counter.

Afterwards, we’ll review the directory structure for the project and then implement the entire person counting project.

Finally, we’ll examine the results of applying people counting with OpenCV to actual videos.

Required Python libraries for people counting

In order to build our people counting applications, we’ll need a number of different Python libraries, including:

Additionally, you’ll also want to use the “Downloads” section of this blog post to download my source code which includes:

  1. My special
      module which we’ll implement and use later in this post
  2. The Python driver script used to start the people counter
  3. All example videos used here in the post

I’m going to assume you already have NumPy, OpenCV, and dlib installed on your system.

If you don’t have OpenCV installed, you’ll want to head to my OpenCV install page and follow the relevant tutorial for your particular operating system.

If you need to install dlib, you can use this guide.

Finally, you can install/upgrade your imutils via the following command:

$ pip install --upgrade imutils

Understanding object detection vs. object tracking

There is a fundamental difference between object detection and object tracking that you must understand before we proceed with the rest of this tutorial.

When we apply object detection we are determining where in an image/frame an object is. An object detector is also typically more computationally expensive, and therefore slower, than an object tracking algorithm. Examples of object detection algorithms include Haar cascades, HOG + Linear SVM, and deep learning-based object detectors such as Faster R-CNNs, YOLO, and Single Shot Detectors (SSDs).

An object tracker, on the other hand, will accept the input (x, y)-coordinates of where an object is in an image and will:

  1. Assign a unique ID to that particular object
  2. Track the object as it moves around a video stream, predicting the new object location in the next frame based on various attributes of the frame (gradient, optical flow, etc.)

Examples of object tracking algorithms include MedianFlow, MOSSE, GOTURN, kernalized correlation filters, and discriminative correlation filters, to name a few.

If you’re interested in learning more about the object tracking algorithms built into OpenCV, be sure to refer to this blog post.

Combining both object detection and object tracking

Highly accurate object trackers will combine the concept of object detection and object tracking into a single algorithm, typically divided into two phases:

  • Phase 1 — Detecting: During the detection phase we are running our computationally more expensive object tracker to (1) detect if new objects have entered our view, and (2) see if we can find objects that were “lost” during the tracking phase. For each detected object we create or update an object tracker with the new bounding box coordinates. Since our object detector is more computationally expensive we only run this phase once every N frames.
  • Phase 2 — Tracking: When we are not in the “detecting” phase we are in the “tracking” phase. For each of our detected objects, we create an object tracker to track the object as it moves around the frame. Our object tracker should be faster and more efficient than the object detector. We’ll continue tracking until we’ve reached the N-th frame and then re-run our object detector. The entire process then repeats.

The benefit of this hybrid approach is that we can apply highly accurate object detection methods without as much of the computational burden. We will be implementing such a tracking system to build our people counter.

Project structure

Let’s review the project structure for today’s blog post. Once you’ve grabbed the code from the “Downloads” section, you can inspect the directory structure with the

$ tree --dirsfirst
├── pyimagesearch
│   ├──
│   ├──
│   └──
├── mobilenet_ssd
│   ├── MobileNetSSD_deploy.caffemodel
│   └── MobileNetSSD_deploy.prototxt
├── videos
│   ├── example_01.mp4
│   └── example_02.mp4
├── output
│   ├── output_01.avi
│   └── output_02.avi

4 directories, 10 files

Zeroing in on the most-important two directories, we have:

  1. pyimagesearch/
     : This module contains the centroid tracking algorithm. The centroid tracking algorithm is covered in the “Combining object tracking algorithms” section, but the code is not. For a review of the centroid tracking code (
     ) you should refer to the first post in the series.
  2. mobilenet_ssd/
     : Contains the Caffe deep learning model files. We’ll be using a MobileNet Single Shot Detector (SSD) which is covered at the top of this blog post in the section, “Single Shot Detectors for object detection”.

The heart of today’s project is contained within the
  script — that’s where we’ll spend most of our time. We’ll also review the
  script today.

Combining object tracking algorithms

Figure 1: An animation demonstrating the steps in the centroid tracking algorithm.

To implement our people counter we’ll be using both OpenCV and dlib. We’ll use OpenCV for standard computer vision/image processing functions, along with the deep learning object detector for people counting.

We’ll then use dlib for its implementation of correlation filters. We could use OpenCV here as well; however, the dlib object tracking implementation was a bit easier to work with for this project.

I’ll be including a deep dive into dlib’s object tracking algorithm in next week’s post.

Along with dlib’s object tracking implementation, we’ll also be using my implementation of centroid tracking from a few weeks ago. Reviewing the entire centroid tracking algorithm is outside the scope of this blog post, but I’ve included a brief overview below.

At Step #1 we accept a set of bounding boxes and compute their corresponding centroids (i.e., the center of the bounding boxes):

Figure 2: To build a simple object tracking via centroids script with Python, the first step is to accept bounding box coordinates and use them to compute centroids.

The bounding boxes themselves can be provided by either:

  1. An object detector (such as HOG + Linear SVM, Faster R- CNN, SSDs, etc.)
  2. Or an object tracker (such as correlation filters)

In the above image you can see that we have two objects to track in this initial iteration of the algorithm.

During Step #2 we compute the Euclidean distance between any new centroids (yellow) and existing centroids (purple):

Figure 3: Three objects are present in this image. We need to compute the Euclidean distance between each pair of original centroids (red) and new centroids (green).

The centroid tracking algorithm makes the assumption that pairs of centroids with minimum Euclidean distance between them must be the same object ID.

In the example image above we have two existing centroids (purple) and three new centroids (yellow), implying that a new object has been detected (since there is one more new centroid vs. old centroid).

The arrows then represent computing the Euclidean distances between all purple centroids and all yellow centroids.

Once we have the Euclidean distances we attempt to associate object IDs in Step #3:

Figure 4: Our simple centroid object tracking method has associated objects with minimized object distances. What do we do about the object in the bottom-left though?

In Figure 4 you can see that our centroid tracker has chosen to associate centroids that minimize their respective Euclidean distances.

But what about the point in the bottom-left?

It didn’t get associated with anything — what do we do?

To answer that question we need to perform Step #4, registering new objects:

Figure 5: In our object tracking example, we have a new object that wasn’t matched with an existing object, so it is registered as object ID #3.

Registering simply means that we are adding the new object to our list of tracked objects by:

  1. Assigning it a new object ID
  2. Storing the centroid of the bounding box coordinates for the new object

In the event that an object has been lost or has left the field of view, we can simply deregister the object (Step #5).

Exactly how you handle when an object is “lost” or is “no longer visible” really depends on your exact application, but for our people counter, we will deregister people IDs when they cannot be matched to any existing person objects for 40 consecutive frames.

Again, this is only a brief overview of the centroid tracking algorithm.

Note: For a more detailed review, including an explanation of the source code used to implement centroid tracking, be sure to refer to this post.

Creating a “trackable object”

In order to track and count an object in a video stream, we need an easy way to store information regarding the object itself, including:

  • It’s object ID
  • It’s previous centroids (so we can easily to compute the direction the object is moving)
  • Whether or not the object has already been counted

To accomplish all of these goals we can define an instance of

  — open up the
  file and insert the following code:
class TrackableObject:
	def __init__(self, objectID, centroid):
		# store the object ID, then initialize a list of centroids
		# using the current centroid
		self.objectID = objectID
		self.centroids = [centroid]

		# initialize a boolean used to indicate if the object has
		# already been counted or not
		self.counted = False


  constructor accepts an
  and stores them. The centroids variable is a list because it will contain an object’s centroid location history.

The constructor also initializes

 , indicating that the object has not been counted yet.

Implementing our people counter with OpenCV + Python

With all of our supporting Python helper tools and classes in place, we are now ready to built our OpenCV people counter.

Open up your
  file and insert the following code:
# import the necessary packages
from pyimagesearch.centroidtracker import CentroidTracker
from pyimagesearch.trackableobject import TrackableObject
from import VideoStream
from import FPS
import numpy as np
import argparse
import imutils
import time
import dlib
import cv2

We begin by importing our necessary packages:

  • From the
      module, we import our custom
  • The 
      modules from
      will help us to work with a webcam and to calculate the estimated Frames Per Second (FPS) throughput rate.
  • We need
      for its OpenCV convenience functions.
  • The
      library will be used for its correlation tracker implementation.
  • OpenCV will be used for deep neural network inference, opening video files, writing video files, and displaying output frames to our screen.

Now that all of the tools are at our fingertips, let’s parse command line arguments:

# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--prototxt", required=True,
	help="path to Caffe 'deploy' prototxt file")
ap.add_argument("-m", "--model", required=True,
	help="path to Caffe pre-trained model")
ap.add_argument("-i", "--input", type=str,
	help="path to optional input video file")
ap.add_argument("-o", "--output", type=str,
	help="path to optional output video file")
ap.add_argument("-c", "--confidence", type=float, default=0.4,
	help="minimum probability to filter weak detections")
ap.add_argument("-s", "--skip-frames", type=int, default=30,
	help="# of skip frames between detections")
args = vars(ap.parse_args())

We have six command line arguments which allow us to pass information to our people counter script from the terminal at runtime:

  • --prototxt
     : Path to the Caffe “deploy” prototxt file.
  • --model
     : The path to the Caffe pre-trained CNN model.
  • --input
     : Optional input video file path. If no path is specified, your webcam will be utilized.
  • --output
     : Optional output video path. If no path is specified, a video will not be recorded.
  • --confidence
     : With a default value of
     , this is the minimum probability threshold which helps to filter out weak detections.
  • --skip-frames
     : The number of frames to skip before running our DNN detector again on the tracked object. Remember, object detection is computationally expensive, but it does help our tracker to reassess objects in the frame. By default we skip
      frames between detecting objects with the OpenCV DNN module and our CNN single shot detector model.

Now that our script can dynamically handle command line arguments at runtime, let’s prepare our SSD:

# initialize the list of class labels MobileNet SSD was trained to
# detect
CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",
	"bottle", "bus", "car", "cat", "chair", "cow", "diningtable",
	"dog", "horse", "motorbike", "person", "pottedplant", "sheep",
	"sofa", "train", "tvmonitor"]

# load our serialized model from disk
print("[INFO] loading model...")
net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

First, we’ll initialize

  — the list of classes that our SSD supports. This list should not be changed if you’re using the model provided in the “Downloads”We’re only interested in the “person” class, but you could count other moving objects as well (however, if your “pottedplant”, “sofa”, or “tvmonitor” grows legs and starts moving, you should probably run out of your house screaming rather than worrying about counting them! 😋 ).

On Line 38 we load our pre-trained MobileNet SSD used to detect objects (but again, we’re just interested in detecting and tracking people, not any other class). To learn more about MobileNet and SSDs, please refer to my previous blog post.

From there we can initialize our video stream:

# if a video path was not supplied, grab a reference to the webcam
if not args.get("input", False):
	print("[INFO] starting video stream...")
	vs = VideoStream(src=0).start()

# otherwise, grab a reference to the video file
	print("[INFO] opening video file...")
	vs = cv2.VideoCapture(args["input"])

First we handle the case where we’re using a webcam video stream (Lines 41-44). Otherwise, we’ll be capturing frames from a video file (Lines 47-49).

We still have a handful of initializations to perform before we begin looping over frames:

# initialize the video writer (we'll instantiate later if need be)
writer = None

# initialize the frame dimensions (we'll set them as soon as we read
# the first frame from the video)
W = None
H = None

# instantiate our centroid tracker, then initialize a list to store
# each of our dlib correlation trackers, followed by a dictionary to
# map each unique object ID to a TrackableObject
ct = CentroidTracker(maxDisappeared=40, maxDistance=50)
trackers = []
trackableObjects = {}

# initialize the total number of frames processed thus far, along
# with the total number of objects that have moved either up or down
totalFrames = 0
totalDown = 0
totalUp = 0

# start the frames per second throughput estimator
fps = FPS().start()

The remaining initializations include:

  • writer
     : Our video writer. We’ll instantiate this object later if we are writing to video.
  • W
     : Our frame dimensions. We’ll need to plug these into
  • ct
     : Our
     . For details on the implementation of
     , be sure to refer to my blog post from a few weeks ago.
  • trackers
     : A list to store the dlib correlation trackers. To learn about dlib correlation tracking stay tuned for next week’s post.
  • trackableObjects
     : A dictionary which maps an
      to a
  • totalFrames
     : The total number of frames processed.
  • totalDown
     : The total number of objects/people that have moved either down or up. These variables measure the actual “people counting” results of the script.
  • fps
     : Our frames per second estimator for benchmarking.

Note: If you get lost in the

  loop below, you should refer back to this bulleted listing of important variables.

Now that all of our initializations are taken care of, let’s loop over incoming frames:

# loop over frames from the video stream
while True:
	# grab the next frame and handle if we are reading from either
	# VideoCapture or VideoStream
	frame =
	frame = frame[1] if args.get("input", False) else frame

	# if we are viewing a video and we did not grab a frame then we
	# have reached the end of the video
	if args["input"] is not None and frame is None:

	# resize the frame to have a maximum width of 500 pixels (the
	# less data we have, the faster we can process it), then convert
	# the frame from BGR to RGB for dlib
	frame = imutils.resize(frame, width=500)
	rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

	# if the frame dimensions are empty, set them
	if W is None or H is None:
		(H, W) = frame.shape[:2]

	# if we are supposed to be writing a video to disk, initialize
	# the writer
	if args["output"] is not None and writer is None:
		fourcc = cv2.VideoWriter_fourcc(*"MJPG")
		writer = cv2.VideoWriter(args["output"], fourcc, 30,
			(W, H), True)

We begin looping on Line 76. At the top of the loop we grab the next

  (Lines 79 and 80). In the event that we’ve reached the end of the video, we’ll
  out of the loop (Lines 84 and 85).

Preprocessing the

  takes place on Lines 90 and 91. This includes resizing and swapping color channels as dlib requires an

We grab the dimensions of the

  for the video
  (Lines 94 and 95).

From there we’ll instantiate the video

  if an output path was provided via command line argument (Lines 99-102). To learn more about writing video to disk, be sure to refer to this post.

Now let’s detect people using the SSD:

# initialize the current status along with our list of bounding
	# box rectangles returned by either (1) our object detector or
	# (2) the correlation trackers
	status = "Waiting"
	rects = []

	# check to see if we should run a more computationally expensive
	# object detection method to aid our tracker
	if totalFrames % args["skip_frames"] == 0:
		# set the status and initialize our new set of object trackers
		status = "Detecting"
		trackers = []

		# convert the frame to a blob and pass the blob through the
		# network and obtain the detections
		blob = cv2.dnn.blobFromImage(frame, 0.007843, (W, H), 127.5)
		detections = net.forward()

We initialize a

  as “Waiting” on Line 107. Possible
  states include:
  • Waiting: In this state, we’re waiting on people to be detected and tracked.
  • Detecting: We’re actively in the process of detecting people using the MobileNet SSD.
  • Tracking: People are being tracked in the frame and we’re counting the


  list will be populated either via detection or tracking. We go ahead and initialize
  on Line 108.

It’s important to understand that deep learning object detectors are very computationally expensive, especially if you are running them on your CPU.

To avoid running our object detector on every frame, and to speed up our tracking pipeline, we’ll be skipping every N frames (set by command line argument

  is the default). Only every N frames will we exercise our SSD for object detection. Otherwise, we’ll simply be tracking moving objects in-between.

Using the modulo operator on Line 112 we ensure that we’ll only execute the code in the if-statement every N frames.

Assuming we’ve landed on a multiple of

 , we’ll update the
  to “Detecting” (Line 114).

Then we initialize our new list of

  (Line 115).

Next, we’ll perform inference via object detection. We begin by creating a

  from the image, followed by passing the
  through the net to obtain
  (Lines 119-121).

Now we’ll loop over each of the

  in hopes of finding objects belonging to the “person” class:
# loop over the detections
		for i in np.arange(0, detections.shape[2]):
			# extract the confidence (i.e., probability) associated
			# with the prediction
			confidence = detections[0, 0, i, 2]

			# filter out weak detections by requiring a minimum
			# confidence
			if confidence > args["confidence"]:
				# extract the index of the class label from the
				# detections list
				idx = int(detections[0, 0, i, 1])

				# if the class label is not a person, ignore it
				if CLASSES[idx] != "person":

Looping over

  on Line 124, we proceed to grab the
  (Line 127) and filter out weak results + those that don’t belong to the “person” class (Lines 131-138).

Now we can compute a bounding box for each person and begin correlation tracking:

# compute the (x, y)-coordinates of the bounding box
				# for the object
				box = detections[0, 0, i, 3:7] * np.array([W, H, W, H])
				(startX, startY, endX, endY) = box.astype("int")

				# construct a dlib rectangle object from the bounding
				# box coordinates and then start the dlib correlation
				# tracker
				tracker = dlib.correlation_tracker()
				rect = dlib.rectangle(startX, startY, endX, endY)
				tracker.start_track(rgb, rect)

				# add the tracker to our list of trackers so we can
				# utilize it during skip frames

Computing our bounding

  takes place on Lines 142 and 143.

Then we instantiate our dlib correlation

  on Line 148, followed by passing in the object’s bounding box coordinates to
 , storing the result as
  (Line 149).

Subsequently, we start tracking on Line 150 and append the

  to the
  list on Line 154.

That’s a wrap for all operations we do every N skip-frames!

Let’s take care of the typical operations where tracking is taking place in the

# otherwise, we should utilize our object *trackers* rather than
	# object *detectors* to obtain a higher frame processing throughput
		# loop over the trackers
		for tracker in trackers:
			# set the status of our system to be 'tracking' rather
			# than 'waiting' or 'detecting'
			status = "Tracking"

			# update the tracker and grab the updated position
			pos = tracker.get_position()

			# unpack the position object
			startX = int(pos.left())
			startY = int(
			endX = int(pos.right())
			endY = int(pos.bottom())

			# add the bounding box coordinates to the rectangles list
			rects.append((startX, startY, endX, endY))

Most of the time, we aren’t landing on a skip-frame multiple. During this time, we’ll utilize our

  to track our object rather than applying detection.

We begin looping over the available

  on Line 160.

We proceed to update the

  to “Tracking” (Line 163) and grab the object position (Lines 166 and 167).

From there we extract the position coordinates (Lines 170-173) followed by populating the information in our


Now let’s draw a horizontal visualization line (that people must cross in order to be tracked) and use the centroid tracker to update our object centroids:

# draw a horizontal line in the center of the frame -- once an
	# object crosses this line we will determine whether they were
	# moving 'up' or 'down'
	cv2.line(frame, (0, H // 2), (W, H // 2), (0, 255, 255), 2)

	# use the centroid tracker to associate the (1) old object
	# centroids with (2) the newly computed object centroids
	objects = ct.update(rects)

On Line 181 we draw the horizontal line which we’ll be using to visualize people “crossing” — once people cross this line we’ll increment our respective counters

Then on Line 185, we utilize our

  instantiation to accept the list of
 , regardless of whether they were generated via object detection or object tracking. Our centroid tracker will associate object IDs with object locations.

In this next block, we’ll review the logic which counts if a person has moved up or down through the frame:

# loop over the tracked objects
	for (objectID, centroid) in objects.items():
		# check to see if a trackable object exists for the current
		# object ID
		to = trackableObjects.get(objectID, None)

		# if there is no existing trackable object, create one
		if to is None:
			to = TrackableObject(objectID, centroid)

		# otherwise, there is a trackable object so we can utilize it
		# to determine direction
			# the difference between the y-coordinate of the *current*
			# centroid and the mean of *previous* centroids will tell
			# us in which direction the object is moving (negative for
			# 'up' and positive for 'down')
			y = [c[1] for c in to.centroids]
			direction = centroid[1] - np.mean(y)

			# check to see if the object has been counted or not
			if not to.counted:
				# if the direction is negative (indicating the object
				# is moving up) AND the centroid is above the center
				# line, count the object
				if direction < 0 and centroid[1] < H // 2:
					totalUp += 1
					to.counted = True

				# if the direction is positive (indicating the object
				# is moving down) AND the centroid is below the
				# center line, count the object
				elif direction > 0 and centroid[1] > H // 2:
					totalDown += 1
					to.counted = True

		# store the trackable object in our dictionary
		trackableObjects[objectID] = to

We begin by looping over the updated bounding box coordinates of the object IDs (Line 188).

On Line 191 we attempt to fetch a

  for the current

If the

  doesn’t exist for the
 , we create one (Lines 194 and 195).

Otherwise, there is already an existing

 , so we need to figure out if the object (person) is moving up or down.

To do so, we grab the y-coordinate value for all previous centroid locations for the given object (Line 204). Then we compute the

  by taking the difference between the current centroid location and the mean of all previous centroid locations (Line 205).

The reason we take the mean is to ensure our direction tracking is more stable. If we stored just the previous centroid location for the person we leave ourselves open to the possibility of false direction counting. Keep in mind that object detection and object tracking algorithms are not “magic” — sometimes they will predict bounding boxes that may be slightly off what you may expect; therefore, by taking the mean, we can make our people counter more accurate.

If the

  has not been
  (Line 209), we need to determine if it’s ready to be counted yet (Lines 213-222), by:
  1. Checking if the
      is negative (indicating the object is moving Up) AND the centroid is Above the centerline. In this case we increment
  2. Or checking if the
      is positive (indicating the object is moving Down) AND the centroid is Below the centerline. If this is true, we increment

Finally, we store the

  in our
  dictionary (Line 225) so we can grab and update it when the next frame is captured.

We’re on the home-stretch!

The next three code blocks handle:

  1. Display (drawing and writing text to the frame)
  2. Writing frames to a video file on disk (if the
      command line argument is present)
  3. Capturing keypresses
  4. Cleanup

First we’ll draw some information on the frame for visualization:

# draw both the ID of the object and the centroid of the
		# object on the output frame
		text = "ID {}".format(objectID)
		cv2.putText(frame, text, (centroid[0] - 10, centroid[1] - 10),
			cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2), (centroid[0], centroid[1]), 4, (0, 255, 0), -1)

	# construct a tuple of information we will be displaying on the
	# frame
	info = [
		("Up", totalUp),
		("Down", totalDown),
		("Status", status),

	# loop over the info tuples and draw them on our frame
	for (i, (k, v)) in enumerate(info):
		text = "{}: {}".format(k, v)
		cv2.putText(frame, text, (10, H - ((i * 20) + 20)),
			cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)

Here we overlay the following data on the frame:

  • ObjectID
     : Each object’s numerical identifier.
  • centroid
      : The center of the object will be represented by a “dot” which is created by filling in a circle.
  • info
      : Includes
     , and

For a review of drawing operations, be sure to refer to this blog post.

Then we’ll write the

  to a video file (if necessary) and handle keypresses:
# check to see if we should write the frame to disk
	if writer is not None:

	# show the output frame
	cv2.imshow("Frame", frame)
	key = cv2.waitKey(1) & 0xFF

	# if the `q` key was pressed, break from the loop
	if key == ord("q"):

	# increment the total number of frames processed thus far and
	# then update the FPS counter
	totalFrames += 1

In this block we:

  • Write the 
     , if necessary, to the output video file (Lines 249 and 250)
  • Display the
      and handle keypresses (Lines 253-258). If “q” is pressed, we
      out of the frame processing loop.
  • Update our
      counter (Line 263)

We didn’t make too much of a mess, but now it’s time to clean up:

# stop the timer and display FPS information
print("[INFO] elapsed time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# check to see if we need to release the video writer pointer
if writer is not None:

# if we are not using a video file, stop the camera video stream
if not args.get("input", False):

# otherwise, release the video file pointer

# close any open windows

To finish out the script, we display the FPS info to the terminal, release all pointers, and close any open windows.

Just 283 lines of code later, we are now done 😎.

People counting results

To see our OpenCV people counter in action, make sure you use the “Downloads” section of this blog post to download the source code and example videos.

From there, open up a terminal and execute the following command:

$ python --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \
	--model mobilenet_ssd/MobileNetSSD_deploy.caffemodel \
	--input videos/example_01.mp4 --output output/output_01.avi
[INFO] loading model...
[INFO] opening video file...
[INFO] elapsed time: 37.27
[INFO] approx. FPS: 34.42

Here you can see that our person counter is counting the number of people who:

  1. Are entering the department store (down)
  2. And the number of people who are leaving (up)

At the end of the first video you’ll see there have been 7 people who entered and 3 people who have left.

Furthermore, examining the terminal output you’ll see that our person counter is capable of running in real-time, obtaining 34 FPS throughout.  This is despite the fact that we are using a deep learning object detector for more accurate person detections.

Our 34 FPS throughout rate is made possible through our two-phase process of:

  1. Detecting people once every 30 frames
  2. And then applying a faster, more efficient object tracking algorithm in all frames in between.

Another example of people counting with OpenCV can be seen below:

$ python --prototxt mobilenet_ssd/MobileNetSSD_deploy.prototxt \
	--model mobilenet_ssd/MobileNetSSD_deploy.caffemodel \
	--input videos/example_01.mp4 --output output/output_02.avi 
[INFO] loading model...
[INFO] opening video file...
[INFO] elapsed time: 36.88
[INFO] approx. FPS: 34.79

I’ve included a short GIF below to give you an idea of how the algorithm works:

Figure 7: An example of an OpenCV people counter in action.

A full video of the demo can be seen below:

This time there have been 2 people who have entered the department store and 14 people who have left.

You can see how useful this system would be to a store owner interested in foot traffic analytics.

The same type of system for counting foot traffic with OpenCV can be used to count automobile traffic with OpenCV and I hope to cover that topic in a future blog post.

Additionally, a big thank you to David McDuffee for recording the example videos used here today! David works here with me at PyImageSearch and if you’ve ever emailed PyImageSearch before, you have very likely interacted with him. Thank you for making this post possible, David! Also a thank you to BenSound for providing the music for the video demos included in this post.

What are the next steps?

Congratulations on building your person counter with OpenCV!

If you’re interested in learning more about OpenCV, including building other real-world applications, including face detection, object recognition, and more, I would suggest reading through my book, Practical Python and OpenCV + Case Studies.

Practical Python and OpenCV is meant to be a gentle introduction to the world of computer vision and image processing. This book is perfect if you:

  • Are new to the world of computer vision and image processing
  • Have some past image processing experience but are new to Python
  • Are looking for some great example projects to get your feet wet


Learn OpenCV fundamentals in a single weekend!

If you’re looking for a more detailed dive into computer vision, I would recommend working through the PyImageSearch Gurus course. The PyImageSearch Gurus course is similar to a college survey course and many students report that they learn more than a typical university class.

Inside you’ll find over 168 lessons, starting with the fundamentals of computer vision, all the way up to more advanced topics, including:

  • Face recognition
  • Automatic license plate recognition
  • Training your own custom object detectors
  • …and much more!

You’ll also find a thriving community of like-minded individuals who are itching to learn about computer vision. Each day in the community forums we discuss:

  • Your burning questions about computer vision
  • New project ideas and resources
  • Kaggle and other competitions
  • Development environment and code issues
  • …among many other topics!

Master computer vision inside PyImageSearch Gurus!


In today’s blog post we learned how to build a people counter using OpenCV and Python.

Our implementation is:

  • Capable of running in real-time on a standard CPU
  • Utilizes deep learning object detectors for improved person detection accuracy
  • Leverages two separate object tracking algorithms, including both centroid tracking and correlation filters for improved tracking accuracy
  • Applies both a “detection” and “tracking” phase, making it capable of (1) detecting new people and (2) picking up people that may have been “lost” during the tracking phase

I hope you enjoyed today’s post on people counting with OpenCV!

To download the code to this blog post (and apply people counting to your own projects), just enter your email address in the form below!