Machine Learning in Xamarin.Forms with ONNX Runtime
Machine learning can be used to add smart capabilities to mobile applications and enhance the user experience. There are situations where inferencing on-device is required or preferable over cloud-based solutions. Key drivers include:
- Availability: works when the device is offline
- Privacy: data can remain on the device
- Performance: on-device inferencing is often faster than sending data to the cloud for processing
- Cost efficiencies: on-device inferencing can be free and more efficient by reducing data transfers between device and cloud
Android and iOS provide built-in capabilities for on-device inferencing with technologies such as TensorFlow Lite and Core ML.
ONNX Runtime has recently added support for Xamarin and can be integrated into your mobile application to execute cross-platform on-device inferencing of ONNX (Open Neural Network Exchange) models. It already powers machine learning models in key Microsoft products and services across Office, Azure, Bing, as well as other community projects. You can create your own ONNX models, using services such as Azure Custom Vision, or convert existing models to ONNX format.
This post walks through an example demonstrating the high-level steps for leveraging ONNX Runtime in a Xamarin.Forms app for on-device inferencing. An existing open-source image classification model (MobileNet) from the ONNX Model Zoo has been used for this example.
Getting Started
Our sample Xamarin.Forms app classifies the primary object in the provided image, a golden retriever in this case, and displays the result with the highest score.
The following changes were made to the Blank App template which includes a common .NET Standard project along with Android and iOS targets.
- Updating NuGet packages for the entire solution
- Adding the OnnxRuntime NuGet package to each project
- Adding the sample photo, classification labels, and MobileNet model to the common project as embedded resources
- Setting the C# language version for the common project to Latest Major
- Updating info.plist, from the iOS project, to specify a minimum system version of 11.0
- Adding a new class called MobileNetImageClassifier to handle the inferencing
- Updating the templated MainPage XAML to include a Run Button
- Handling the Button Clicked event in the MainPage code-behind to run the inferencing
Inferencing With ONNX Runtime
Two classes from the common project we’ll be focused on are:
MobileNetImageClassifier
MobileNetImageClassifier encapsulates the use of the model via ONNX Runtime as per the model documentation. You can see visualizations of the model’s network architecture, including the expected names, types, and shapes (dimensions) for its inputs and outputs using Netron.
This class exposes two public methods GetSampleImageAsync and GetClassificationAsync. The former loads a sample image for convenience and the latter performs the inferencing on the supplied image. Here’s a breakdown of the key steps.
Initialization
Initialization involves loading those embedded resource files representing the model, labels, and sample image. The asynchronous initialization pattern is used to simplify its use downstream while preventing use of those resources before initialization work has completed. Our constructor starts the asynchronous initialization by calling InitAsync. The initialization Task is stored so several callers can await the completion of the same operation. In this case, the GetSampleImageAsync and GetClassificationAsync methods.
const int DimBatchSize = 1;
const int DimNumberOfChannels = 3;
const int ImageSizeX = 224;
const int ImageSizeY = 224;
const string ModelInputName = "input";
const string ModelOutputName = "output";
byte[] _model;
byte[] _sampleImage;
List<string> _labels;
InferenceSession _session;
Task _initTask;
...
public MobileNetImageClassifier()
{
_ = InitAsync();
}
Task InitAsync()
{
if (_initTask == null || _initTask.IsFaulted)
_initTask = InitTask();
return _initTask;
}
async Task InitTask()
{
var assembly = GetType().Assembly;
// Get labels
using var labelsStream = assembly.GetManifestResourceStream($"{assembly.GetName().Name}.imagenet_classes.txt");
using var reader = new StreamReader(labelsStream);
string text = await reader.ReadToEndAsync();
_labels = text.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries).ToList();
// Get model and create session
using var modelStream = assembly.GetManifestResourceStream($"{assembly.GetName().Name}.mobilenetv2-7.onnx");
using var modelMemoryStream = new MemoryStream();
modelStream.CopyTo(modelMemoryStream);
_model = modelMemoryStream.ToArray();
_session = new InferenceSession(_model);
// Get sample image
using var sampleImageStream = assembly.GetManifestResourceStream($"{assembly.GetName().Name}.dog.jpg");
using var sampleImageMemoryStream = new MemoryStream();
sampleImageStream.CopyTo(sampleImageMemoryStream);
_sampleImage = sampleImageMemoryStream.ToArray();
}
Preprocessing
Raw images must be transformed according to the requirements of the model so it matches how the model was trained. Image data is then stored in a contiguous sequential block of memory, represented by a Tensor object, as input for inferencing.
Our first step in this process is to resize the original image if necessary so height and width is at least 224. In this case, images are first resized so the shortest edge is 224 then center-cropped so the longest edge is also 224. For the purposes of this example, SkiaSharp has been used to handle the requisite image processing.
using var sourceBitmap = SKBitmap.Decode(image);
var pixels = sourceBitmap.Bytes;
if (sourceBitmap.Width != ImageSizeX || sourceBitmap.Height != ImageSizeY)
{
float ratio = (float)Math.Min(ImageSizeX, ImageSizeY) / Math.Min(sourceBitmap.Width, sourceBitmap.Height);
using SKBitmap scaledBitmap = sourceBitmap.Resize(new SKImageInfo(
(int)(ratio * sourceBitmap.Width),
(int)(ratio * sourceBitmap.Height)),
SKFilterQuality.Medium);
var horizontalCrop = scaledBitmap.Width - ImageSizeX;
var verticalCrop = scaledBitmap.Height - ImageSizeY;
var leftOffset = horizontalCrop == 0 ? 0 : horizontalCrop / 2;
var topOffset = verticalCrop == 0 ? 0 : verticalCrop / 2;
var cropRect = SKRectI.Create(
new SKPointI(leftOffset, topOffset),
new SKSizeI(ImageSizeX, ImageSizeY));
using SKImage currentImage = SKImage.FromBitmap(scaledBitmap);
using SKImage croppedImage = currentImage.Subset(cropRect);
using SKBitmap croppedBitmap = SKBitmap.FromImage(croppedImage);
pixels = croppedBitmap.Bytes;
}
The second step is to normalize the resulting image pixels and store them in a flat array that can be used to create the Tensor object. In this case, the model expects the R, G, and B values to be in the range of [0, 1] normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. The loop below iterates over the image pixels one row at a time, applies the requisite normalization to each value, then stores each in the channelData array. Our channelData array stores the normalized R,G, and B values sequentially; first all the R values, then all the G, then all the B (instead of the original sequence i.e. RGB, RGB, etc.
var bytesPerPixel = sourceBitmap.BytesPerPixel;
var rowLength = ImageSizeX * bytesPerPixel;
var channelLength = ImageSizeX * ImageSizeY;
var channelData = new float[channelLength * 3];
var channelDataIndex = 0;
for (int y = 0; y < ImageSizeY; y++)
{
var rowOffset = y * rowLength;
for (int x = 0, columnOffset = 0; x < ImageSizeX; x++, columnOffset += bytesPerPixel)
{
var pixelOffset = rowOffset + columnOffset;
var pixelR = pixels[pixelOffset];
var pixelG = pixels[pixelOffset + 1];
var pixelB = pixels[pixelOffset + 2];
var rChannelIndex = channelDataIndex;
var gChannelIndex = channelDataIndex + channelLength;
var bChannelIndex = channelDataIndex + (channelLength * 2);
channelData[rChannelIndex] = (pixelR / 255f - 0.485f) / 0.229f;
channelData[gChannelIndex] = (pixelG / 255f - 0.456f) / 0.224f;
channelData[bChannelIndex] = (pixelB / 255f - 0.406f) / 0.225f;
channelDataIndex++;
}
}
This channelData array is then used to create the requisite Tensor object as input to the InferenceSession Run method. The dimensions (1, 3, 224, 224) represent the shape of the Tensor required by the model. In this case, mini-batches of 3-channel RGB images that are expected to have a height and width of 224.
var input = new DenseTensor<float>(channelData, new[]
{
DimBatchSize,
DimNumberOfChannels,
ImageSizeX,
ImageSizeY
});
Inferencing
An InferenceSession is the runtime representation of an ONNX model. It’s used to run the model with a given input returning the computed output values. Both the input and output values are collections of NamedOnnxValue objects representing name-value pairs of string names and Tensor objects.
using var results = _session.Run(new List<NamedOnnxValue>
{
NamedOnnxValue.CreateFromTensor(ModelInputName, input)
});
Postprocessing
This model outputs a score for each classification. Our code resolves the Tensor by name and gets the highest score value for simplicity in this example. The corresponding label item is then used as the return value for the GetClassificationAsync method. Additional work would be required to calculate the softmax probability if you wanted to include an indication of confidence alongside the label.
var output = results.FirstOrDefault(i => i.Name == ModelOutputName);
var scores = output.AsTensor<float>().ToList();
var highestScore = scores.Max();
var highestScoreIndex = scores.IndexOf(highestScore);
var label = _labels.ElementAt(highestScoreIndex);
MainPage
Our MainPage XAML features a single Button for running the inference via the MobileNetImageClassifier. It resolves the sample image then passes it into the GetClassificationAsync method before displaying the result via an alert.
var sampleImage = await _classifier.GetSampleImageAsync();
var result = await _classifier.GetClassificationAsync(sampleImage);
await DisplayAlert("Result", result, "OK");
Optimizations and Tips
This first-principles example demonstrates basic inferencing with ONNX Runtime and leverages the default options for the most part. There are several optimizations recommended by the ONNX Runtime documentation that can be particularly beneficial for mobile.
Reuse InferenceSession objects
You can accelerate inference speed by reusing the same InferenceSession across multiple inference runs to avoid unnecessary allocation/disposal overhead.
Consider whether to set values directly on an existing Tensor or create using an existing array
Several examples create the Tensor object first and then set values on it directly. This is simpler and easier to follow since it avoids the need to perform the offset calculations used in this example. However, it’s faster to prepare a primitive array first then use that to create the Tensor object. The trade-off here is between simplicity and performance.
Experiment with different EPs (Execution Providers)
ONNX Runtime executes models using the CPU EP (Execution Provider) by default. It’s possible to use the NNAPI EP (Android) or the Core ML EP (iOS) for ORT format models instead by using the appropriate SessionOptions when creating an InferenceSession. These may or may not offer better performance depending on how much of the model can be run using NNAPI / Core ML and the device capabilities. It’s worth testing with and without the platform-specific EPs then choosing what works best for your model. There are also several options per EP. See NNAPI Options and Core ML Options for more detail on how this can impact performance and accuracy.
The platform-specific EPs can be configured via the SessionOptions. For example:
Use NNAPI on Android
options.AppendExecutionProvider_Nnapi();
Use Core ML on iOS
options.AppendExecutionProvider_CoreML(CoreMLFlags.COREML_FLAG_ONLY_ENABLE_DEVICE_WITH_ANE)
SessionOptionsContainer can be used to simplify use of platform-specific SessionOptions from common platform-agnostic code. For example:
Register named platform-specific SessionOptions configuration
// Android
SessionOptionsContainer.Register("sample_options", (sessionOptions)
=> sessionOptions.AppendExecutionProvider_Nnapi());
// iOS
SessionOptionsContainer.Register("sample_options", (sessionOptions)
=> sessionOptions.AppendExecutionProvider_CoreML(
CoreMLFlags.COREML_FLAG_ONLY_ENABLE_DEVICE_WITH_ANE));
Use in common platform-agnostic code
// Apply named configuration to new SessionOptions
var options = SessionOptionsContainer.Create("sample_options");
...
// Alternatively, apply named configuration to existing SessionOptions
options.ApplyConfiguration("sample_options");
Quantize models to reduce size and execution time
If you have access to the data that was used to train the model you can explore quantizing the model. At a high-level, quantization in ONNX Runtime involves mapping higher precision floating point values to lower precision 8-bit values. This topic is covered in detail in the ONNX performance tuning documentation, but the idea is to trade off some precision in order to reduce the size of the model and increase performance. It can be especially relevant for mobile apps where package size, battery life, and hardware constraints are key considerations. You can test this by switching out the mobilenetv2-7.onnx model, used in this example, for the mobilenetv2-7-quantized.onnx model included in the same repo. Quantization can only be performed on those models that use opset 10 and above. However, models using older opsets can be updated using the VersionConverter tool.
There are several pre-trained ready to deploy models available
There are several existing ONNX models available such as those highlighted in the ONNX Model Zoo collection. Not all of these are optimized for mobile but you’re not limited to using models already in ONNX format. You can convert existing pre-trained ready to deploy models from other popular sources and formats, such as PyTorch Hub, TensorFlow Hub, and SciKit-Learn. The following resources provide guidance on how to convert each of these respective models to ONNX format:
Your pre-processing must of course match how the model was trained and so you’ll need to find the model-specific instructions for the model you convert.