Computer vision has long been an interest of mine, and recently I set out to create a boilerplate project that students can use as a springboard to something greater. This led me to build a quick and lean demonstration of OpenCV built into a simple Flask application. The task: to build a fun facial recognition platform that will replicate some of the popular phone app “filters” and apply funny masks on top of a video or image, which in my app came in the form of a nice cartoony Ninja mask. The tech: Python, OpenCV, and Flask. This was a fun project, and I hope others will take and modify this code to create some interesting apps. I’ll outline some of the reasons I chose the tech that I did as well as talk about my experience during the actual building of the application.

Python is taught at Coding Dojo for a few reasons, but I’d say the biggest is that the language itself is quite approachable. We get students that come to the program from different backgrounds and experience levels and Python is a great language for veterans and newbies alike. The extensive low-level libraries that Python can take advantage of allow us to do some pretty radical things, which is why data scientists and machine learning engineers have adopted it en masse. I try to remind my students that even though something might seem simple, it doesn’t mean that it isn’t robust, which I feel perfect illustrates the strengths of Python.

One of these powerful Python libraries is OpenCV. Created in 1999, OpenCV has been an ongoing project to bring computer vision to the masses, with applications in both business and the hobby world. The library is filled with over 2500 different algorithms that can help developers do something like track a rolling ball, run bitwise operations on photos or frames of video, and even track human faces or cars driving on the freeway. OpenCV also offers the ability to train one’s own model in order to identify and differentiate something like a smile versus a frown. The only limit is one’s creativity: the tools are all there in OpenCV, which has interfaces for Python, C++, MATLAB, and Java.

Flask, as the last piece of tech I used for this experiment, isn’t designed to be the most extensive framework; it is actually called a micro-framework by its developers. I chose Flask over Django for the speed. I wanted to be able to focus on the computer vision side of things without having the overhead that something like Django brings, in regards to a small project like this. I have always been a tinkerer at heart, and the idea was for this project to be a foundational sandbox more than anything else.

Actually implementing OpenCV on its own isn’t bad, I will provide some reference material that will allow anyone to get some images displayed on the screen and some video streams going. The first part to tackle is the video stream. After installing OpenCV-python on my machine and importing it into your Flask application, the video portion of the project is as easy as setting up a new VideoCamera class, generating the camera’s output, and the using a <img> tag on the index page to show the produced stream. All in all, this step mainly relies on a little understanding of how OpenCV wants to handle video, and how to produce a stream within a web application using our generator function.

After establishing a video stream and seeing my beautiful mug on my laptop screen, I next worked on harnessing OpenCV in order to start get some tracking started. There are a slew of trained models to choose from that are included in the library, but I settled on frontal face detection and eye detection. Both of the chosen models use a detection technique known as Haar-cascade, but if one has the yearning to choose a different detection model they are more than welcome to choose another type from OpenCV’s library. There are different ways to detect something like an edge, and the documentation provided by the OpenCV team does a great job of explaining the nitty gritty of some of these methods. At its core, Haar-cascade detection uses “edge features”, “line features”, and “four-rectangle features” and more specifically the ratio of white and black pixels in an image to determine where edges exist. This detection, combined with the trained model that has been given lots of “positive” (face included) and “negative” (no face) data, produces a toolset that we can use to find faces in a picture or video pretty accurately. Very soon after reading the documentation, I was able to draw a nice green rectangle around an individual’s face within a scene, and it would also work for multiple people, albeit a little buggy at times.

With the face tracking out of the way so to speak, my next step was to attempt a fun demo incorporating a cartoon ninja mask laid over someone’s face. The naive approach would lead us to believe that placing an image in a video stream is as simple as adding a “layer” to said video that displays our image on “top”. OpenCV doesn’t work in this manner when combining images, instead, it opts for bitwise operations on both images. What this entails is essentially creating a mask of the smaller image that will be placed on “top”, using the mask to determine the actual footprint of the image on the main picture or video, and essentially subtracting that data from the existing image. After that is finished, we have blocked out the exact space that the new image will take up, and we are effectively adding our new data to the existing frame. Without the blocking technique, the images will be combined and add a sort of opacity to the image on top, which isn’t what we want.

Though there are stutters and dropped frames from time to time in the video, all of this combined to become a very fun experiment that I hope will inspire others to take a stab at this implementation of computer vision. My next steps for this project will be to fix the scaling issues that exist on the image that we are using as a mask, as well as add some type of menu so a user can simply choose their own silly mask, or possibly upload their own image.