Neural Compute Stick 2 equipped with MYRIAD chip. Image by Author.

A brief guide to run your model everywhere

Application of machine learning are, nowadays, endless. There are models out there already trained and data that need nothing but to be fitted.

When one deals with Machine Learning applications it often tricky deciding on which hardware train the model and offload the inference.

Intel® has introduced openVINO® that help deploy models (mostly related to computer vision) to any Intel® device, no matter if they are CPUs, GPUs, FPGAs or a Neural Compute Stick with a MYRIAD chip.

Intel® Distribution of OpenVINO™ Toolkit

One of the common usages of this technology is using openVINO to make a fast inference (i.e. predicting objects live in a video from a webcam). Some models require a powerful hardware to be trained, but it would be sufficient a smaller and less expensive one to performing inference. For example one can use a complex pretrained model and then run it on light hardware like a RaspberryPi.

OpenVINO is a powerful tool, but it requires some passages in order to work properly. We will go trough the process of converting a model, load it into an Inference engine plugin and perform an inference.

Custom Models

The Intel framework has a collection of pretrained and optimised models one could use. But what if you want to train a custom model?

We will considered model trained using Tensorflow, even if openVINO supports many other frameworks. The steps are mostly similar.

I will use Google Colab to describe all of this, it should be easily reproducible.

Download OpenVINO on Colab

First of all, we need to download openVINO repository and install prerequisites for Tensorflow.

!git clone
!cd openvino/model-optimizer/install_prerequisites/ && ./ tf2

Use keras are you are used to

In this step you can do whatever you want. Create a custom keras model, with custom layers. Compile it. Train it on any dataset and so on.

When you are satisfied with your training, you can prepare the model to be loaded into Inference Engine plugin. Then the model can be used for inference on any kind of devices.

Save the model from h5

The first step is to save the model into a .h5 file. This is easily done with the keras API.

model = tf.keras.models.load_model(“/content/model.h5”),’model’)

Convert the model

Models need to be converted into a representation that the Inference Engine can load into devices for inference. This step can be done using the so-called model optimiser as shown below.

Model optimiser requires some parameters in input, like the input_shape.

python3 openvino/model-optimizer/ --saved_model_dir model/ --input_shape=\[1,28,28\]

At this point you should have two files model.xml and model.bin which contain all the required information to perform inference on your custom model.


An example of this conversion steps is available in the following notebook:

Google Colaboratory

Execute the model

When the converted model is ready, the next step is to use it for inference. This can be done on any kind of supported device. There is no need to use the same on which you performed training.

Install openVINO for inference

In order to run the inference it is necessary to install the openVINO framework. This can be easily done trough pip.

!pip install --upgrade pip
!pip install openvino

Load the model to the plugin

We need to load the model to a device for the inference.

The Inference Engine Core object has the capabilities to read the .xml and .bin into a network object.

This network can be loaded into any device being it a CPU, GPU or MYRIAD. This is one of the powerful aspects of this API it runs transparently on any kind of device without efforts.

from openvino.inference_engine import IECore, IENetwork
ie = IECore()
net =  ie.read_network(model=model_xml, weights=model_bin)
exec_net =  ie.load_network(network=net, device_name="MYRIAD")

Get info about the topology

We can inspect the network topology. This API supports only a single input, anyway if you are running this on a video, you need to perform inference on a single frame a time, so this is not a big deal.

Moreover we can inspect the input or the output shape, which is particularly useful on models trained by someone else. We need to be sure to have the data in the right shape.

assert len(net.input_info.keys()) == 1, "Sample supports only single input topologies"
assert len(net.outputs) == 1, "Sample supports only single output topologies"
input_blob = next(iter(net.input_info.keys()))
out_blob = next(iter(net.outputs))
net.batch_size = len([0])


And finally, it is time for the inference. Given an object X_test[0] we are ready to infer it using the infer method on the network.

res = exec_net.infer(inputs={input_blob: X_test[0]})


We trained a custom model.

Using the openVINO model optimiser we converted it into a new representation required to load the model to the Inference Engine module.

When the trained model was ready using Inference Engine to infer an input data it is almost trivial. And as a bonus this can be done on any kind of device.

Convert a Tensorflow2 model to OpenVINO® was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.