(image by author)

MLOps (machine learning operations) is key for improving business productivity by getting data science to production. Therefore, it’s essential for any company that wants to gain a competitive edge with AI. In this post, I’ll cover six tips and best practices to accelerate and simplify your path to production, based on my experience with enterprises worldwide. It is an overview of a talk I gave at NVIDIA’s latest GTC conference, which you can view here. I hope you find it useful!

Challenge #1: A long and difficult path to production

Developing the model in research is just the beginning. There are still many steps to cover, to ensure the model works online and generates value in real business applications. These include:

  • Packaging code, scripts, dependencies and configuration into containers
  • Collecting and preparing data from operational and online sources at scale
  • Training the models with large scale data, various algorithms and params
  • Incorporating the models into real-time/interactive pipelines, which feed from live data
  • Monitoring model accuracy and creating automated re-training workflows
  • Adding instrumentation, tracking, versioning, and security to every step
  • Refactoring code for performance in order to meet application SLA
  • Creating continuous development and delivery (CI/CD) pipelines

It can take months to complete these processes, and they are resource intensive.

The Solution: Incorporate automation and adopt a serverless approach

Automating the production process can shorten it from months to weeks. This can be done with serverless frameworks that automatically transform simple code or Notebook to managed and elastic microservices. Open source tools can help, like MLRun, an open source MLOps orchestration framework. (Full disclosure: My team and I maintain it). MLRun automatically converts code into powerful serverless functions and allows simple composition of multiple functions into batch, CI/CD or real-time pipelines.

MLRun functions are fully managed with automated tracking, monitoring, logging, versioning, security, and horizontal scalability, which help to eliminate much of the development and DevOps efforts. They also accelerate time to production without compromising on performance or scalability.

In MLRun, we can use our own functions or the marketplace functions for data preparation, training, model testing, serving, etc. Then, we can compose complex ML pipelines in minutes, and run them at scale with full experiment tracking and artifact versioning.

MLRun development flow (picture by author)

MLRun supports multiple serverless runtime engines (job, dask, spark, MPI/Horovod, Nuclio ..) to address various workloads at scale (data analytics, machine learning, deep learning, streaming, APIs, serving …). It supports multiple pipeline/CI engines for running the ML workflows including Kubeflow Pipelines, Github Actions, GitLab CI/CD, etc.

Functions can also be used to compose real-time production pipelines, handle stream processing, data enrichment, model serving, ensembles, monitoring, and more. MLRun real-time pipelines are built on top of the popular high-performance open source serverless framework Nuclio (which my team and I maintain as well).

MLRun Serving Graph Example (picture by author)

Check out the example notebooks for serving graph and distributed NLP pipeline to see how easy it is to deploy real-time production pipelines with MLRun.

Challenge #2: Inefficiency due to effort duplication & siloed work

Machine learning work is divided up between different teams: data science, engineering, and MLOps. Each of these teams works with different toolsets, making it very difficult to collaborate, and forcing resource intensive code and pipeline refactoring. In addition, in the current practice, teams end up building three separate pipeline architectures to address the research, production, and governance needs, which leads to resource waist and code refactoring or reimplementation.

Data Science Pipelines (picture by author)

We can see from the diagram that the data transformation logic is not the same in the research and production pipelines, leading to inaccurate features and model results. Not to mention the effort duplication.

The Solution: Consolidate platforms and processes for data science, data engineering, and DevOps

Consolidating all the work through a unified production-minded workflow and technology that enables collaboration, will enable teams to work together. It will also lead to improved accuracy and rapid, ongoing deployment of AI for the organization. This serves the needs of all stakeholders.

For example, data scientists will be able to build features without needing to constantly ask for data from data engineers, MLOps teams will be able to redeploy without reengineering, and so on. In addition, utilizing a feature store at the heart of the platform allows unifying the data collection, transformation, and cataloging process for all three use cases (research, production and governance), saving time and effort in one of the most labor-intensive parts of the MLOps process.

Having one end-to-end platform that consolidates it all: the feature store, ML training pipelines, model serving and monitoring, is important for enabling acceleration and automation. Such a platform saves the time, effort and frustration of having to stitch together different components to get a full enterprise-grade working solution.

MLOps Platform Architecture (picture by author)

Challenge #3: Under-utilized resources and limited scaling

When every developer or team has their own dedicated hardware, VMs, or GPUs, the result is infrastructure silos. These silos lead to a waste of resources and to management complexity. They also limit our ability to process data at scale or shorten execution time, compared to if we clustered all those resources together.

Enterprises want to make sure they capitalize on the investment they make on AI Infrastructure, to ensure their compute, data and GPU resources are fully utilized.

The Solution: Use a shared and elastic pool of resources

Pooling resources together into an elastic cluster that can dynamically shrink or grow will enable sharing resources, smarter allocation, and resource-saving. Find a solution that accelerates time to production and makes the most out of your AI infrastructure.

Here is one such example:

(picture by author)

By taking servers and VMs, running Kubernetes on top of the cluster, adding a data layer for moving data across microservices and scheduling workloads on top, you should be able to gain better scale and performance. Include a suite of tools, services for different aspects, development environments and tracking aspects, in your stack., This will accelerate TTP and ensure less resources are consumed, that the pool is shared, and that larger scale workloads can run as needed.

Challenge #4: Handling data & features takes 80% of your team’s time

Handling data requires a lot of work. While AutoML tools have become more widespread and simplify the model building part, you still constantly need to use, prepare, store, and ingest application-specific data. Data handling is the most time consuming and resource-intensive tasks in ML.

The Solution: Use feature stores to simplify feature engineering

Feature engineering enables transforming our raw data into something meaningful that feeds ML workloads. For example, adding activity times, aggregating data and calculating statistics, joining data from different sources, running sentiment analysis, picture rotation, and more. These features can then be used for training offline data, in the online production pipeline and for monitoring and governance.

The solution to this challenge is a feature store. A feature store is a central place for building a pipeline for automatically creating features for training and production. Tech giants like Uber, Twitter and Spotify built their own feature stores, and there are few open-source and commercial tools for feature stores. When you select yours, make sure to check that it suits your needs. For, example if you are planning to ingest streaming data, check whether the feature store can run calculations on the fly and perform joins, aggregations etc., and how well it can be integrated with the other MLOps components.

Challenge #5: Inaccurate models and limited visibility

Models will only stay accurate if we keep having the same data, assumptions, and behavior. But that is rarely the case, since these changes all the time and lead to model drift and inaccuracy.

For example, Covid-19 changed consumer behavior significantly, from buying more food to less flights. If your models were not modified accordingly, they would have produced invalid and inaccurate results.

We must constantly monitor our data and models and alert when they deviate. If these challenges aren’t visible and identified on time, the models will not be accurate, which can lead to negative business outcomes or legal exposure.

The Solution: Make sure model monitoring and re-training is built into your MLOps pipeline

Include a feedback and drift-aware system in your [1] MLOps pipeline to measure the difference between predicted and actual results. The systems should trigger alerts so you can act on the change, for example retrain the model, switch it, etc.

The models of this feedback system should track everything in a real-live stream, with functions grabbing statistical information from the feature store and comparing it with real-time behavior. If a misbehavior is identified, an alert is triggered to a microservice. The microservice will then trigger the alert to you by email, retrain or switch a model, or run any other custom operation.

Production Pipeline With Monitoring (picture by author)

When using MLRun, model activities are tracked in databases and visualized using Grafana or MLRun dashboards. Here are some dashboard snapshots:

Model Monitoring in MLRun & Grafana (picture by author)

Challenge #6: A constantly evolving toolset & steep learning curve

MLOps and data science have a lot of technology behind them, and there is a tremendous and growing number of tools out there. It can be overwhelming to understand what you need to get started, or to support a specific use case. Considerations like data locality, security, collaborations etc. all need to be considered. Often, as soon as you’ve made the decision on your AI infrastructure software stack, something new comes along and alters the equation…

The Solution: Adopt an ‘open yet managed’ state of mind

You want your stack to be future proof. You want to be able to harness the latest updates from the growing data science community. I agree with that approach and will always recommend using open-source tools as much as possible. In addition to their quality, the open-source community provides a lot of answers and support. However, depending on your needs, it might get overwhelming to integrate, maintain, secure, and support them all yourself. That’s where a pre-baked stack of the leading open-source tools along with a managed service experience might be helpful and help you focus your efforts on building your AI applications.

MLOps can be daunting, but sharing from our collective experience will make us all smarter, and faster. I hope you found my article helpful, and if you like you can watch my full talk here.

@sahard@iguazio.com we can add the link to the MLOps page with this text.

6 Tips for MLOps Acceleration & Simplification was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.