Photo by Tyler Lastovich on Unsplash

A practical Workflow to turn Data Science into Software

There is a growing frustration within the data science/machine learning community to see yet another PoC (Proof of Concept) create promising results but never turning into something impactful. The main reason: a large gap between development and deployment.

There is an increasing number of tools and frameworks— open source as well as commercial — that promise to bridge this gap. However, introducing them into an organization is a time-consuming and expensive endeavor.

Fighting for such solutions is honorable and necessary. Still, there is something else we can do to narrow the gap in the meantime: aligning our workflow with software development principles. Number one on things to learn: unit tests.

Unfortunately, most introductions and documentations deal with this topic in an abstract way. In contrast, this blog walks you through a practical workflow with a specific example. Let’s get started!


We are working at a start-up that rapidly increases its customer base. While this is great news for the company, there are processes that need to be adapted to the new situation. One of them is email marketing.

When our company started to send marketing emails, the customer base was small and dedicated. Therefore, it was fine that all registered customers got all the emails. That is not true anymore. While there are still dedicated customers that don’t want to miss out on promotions and product updates, many new customers are less excited. Even worse, an increasing number of potential customers opt out from the marketing newsletter due to irrelevance or too much email.

After some discussion, the founders of our start-up decide to augment the marketing process. They envision a system that delivers emails to the customers that care about them and spare the rest the additional strain on their inbox.

Our job is to build the machine learning parts of that system.


Let’s summarize what we have and what we need:

  • We have some information on our customers.
  • We have some information on a planned marketing email.
  • We need a likelihood for each customer to interact with a given marketing email (excluding the interaction of opting out of the newsletter).

It is very, very tempting for data scientists and machine learning engineers to jump into collecting data and building sophisticated models. However, that is not what is needed from us right now. Remember, the goal is to build a complete system, not a standalone model. So, how can we align our work with these requirements?


We decide to work with a four-step workflow:

  1. Select the next component to develop.
  2. Write unit tests that capture what we want to achieve.
  3. Write code that passes the unit tests.
  4. Repeat.

Let’s get started!

Step 1: Select the next Component for Development

Right now, there is nothing in place at all. However, we know that other teams start to work on the overall system and we want to support them. Therefore, our first task is to create an interface for the overall process, which takes the two inputs and returns the desired output. Let’s call this main class CampaignPredictor. It takes a pandas data frame for customer data, a dictionary for information on the planned campaign and has a method that returns the probabilities for all customers that they will interact with the email. It looks something like this:

Note that I already included type hints, a (preliminary) docstring, and a NotImplementedError. It is very tempting to keep on coding at that point. However, we do ourselves a huge favor if we first state what our expectations for this piece of code are. The best way to express these expectations for us and teams that will integrate our code in the overall process: unit tests.

Step 2: Write Unit Tests that capture our Goals

Writing good unit tests is an art on its own. Still, there are two dimensions of our code we want to cover with them. The first dimension is valid versus invalid inputs. In the case of valid input, we focus on the correct output. In the case of invalid inputs, we focus on error handling and fast failing. The former helps users to understand what is going wrong while the latter ensures that users don’t have to wait if something went wrong.

The second dimension is structural versus functional qualities. Structural unit tests cover the correct type and shape of input and output. Functional unit tests cover, well, the functionality of the code. That is, whether it provides what it promises to solve. Here is how these unit tests look like:

For invalid inputs, there are mainly two types of errors. TypeErrors capture all cases in which we expect one type of input and get another. For instance, we expect a pandas data frame but get a numpy matrix. ValueErrors capture situations in which the type is correct, but we cannot deal with the values itself. For instance, we expect a dictionary that has a “campaign_type” key with a string as a value but either the key is missing or the value is not a string. Both would cause problems later on so we want to catch it early and provide a meaningful error message.

For valid inputs, there should not be any error. First, we expect an output with the correct type and shape. In our case, that is a list with the same length as the number of customers in the input. Second, we expect that all values in this list are valid probabilities That is, they range from 0 to 1.

As you can see, we covered both structural and functional aspects for valid and invalid inputs. Since all these tests fail right now, everything runs as expected. That is, not at all. Let’s fix this.

Step 3: Build what’s necessary to pass the Tests (First Attempt)

While there is no explicit order of the tests, we start with the tests for invalid inputs. This helps us to clarify what we actually expect to get before doing something with it.

Since our main goal is to fail fast if there is invalid input, we will write our verification code as part of the class initialization. In our example, we need to make sure that we get a dictionary with the expected key-value-pairs and a data frame with at least one row, that is at least one customer to select from.

This code passes the test for invalid inputs. While working on this, you probably get additional ideas on what you should test for. In general, that is a good thing because it shows that you are starting to think about your approach in more detail. However, don’t be too specific yet, because things might change later on.

It is very tempting to look at the second test and — finally! — start with modeling. However, that is not what we are aiming for right now. We are aiming for the overall process. At the same time, there are many different ways to predict a customer’s probability to interact with an email.

The solution? We go one step deeper down the rabbit hole. The beauty of unit tests is that a failing test will remind us that there is something left to fix. Let’s implement an easy way to select customers for campaigns fist: naive randomization.

Step 1–3 in Time Lapse

Step 1: We are now building a blueprint for more complex ways to select customers. This placeholder should be fast for easy integration tests — and not completely stupid. Let’s call it the RandomSelector class. It takes a number of customers to choose from and the percentage of them that should receive the email. It then applies a method to select a random set of customers and returns the results as a list.

Step 2: We write unit tests for valid and invalid inputs. Concerning the inputs, we want to make sure that the number of customers is more than one and that the percentage to select is between 0 and 1. If the inputs are not in line with this, we want to see the correct type of error. Concerning the outputs, we want to get a list with the correct length and with at least one selected customer.

As you can see in lines 42–51, I also included an edge case I came up with while writing the tests. Again, unit tests help to clarify what you are actually expecting to get back, which makes the coding afterward way more focused.

Part 3: We update the class initialization to check the inputs and the method to produce the correct output. Once we pass our tests, we finished this task.

Now we can use this new class to take care of what we are started with: the CampaignPredictor.

Step 3: Build what’s necessary to pass the Tests (Second Attempt)

If we correctly translated our requirements from the CampaignPredictor for the RandomSelection class, we need hardly additional code to pass the test for valid inputs:

All tests pass and we can celebrate our first iteration. Before we start the second one, that is a good moment to update the docstring and commit the code to our version control system.

Step 4: Repeat

Our next step depends on business priorities. In this scenario, randomization will not be acceptable, so we need to integrate a piece of code that is able to load a trained model and apply it to the provided dataset. I leave the exercise to think about what valid and invalid inputs look like and what such a module needs to deliver to you.


This workflow and the scenario is simplified. However, the trivial observation that each case is complicated in its own way is no justification to complicate everything. Instead, there are only four questions to while building the next component:

  1. What are the inputs I expect?
  2. How do I deal with invalid or incomplete inputs?
  3. How do I want the output to look like from a structural perspective?
  4. What do I want my output to achieve from a functional perspective?

Admittedly, answering these questions can feel annoying. I think there are mainly two reasons for that:

  1. These questions are surprisingly hard. Good. You are realizing that you are not sure what you want to achieve. Better now than after 1000 additional lines of code.
  2. It feels slow and verbose, especially if deadlines are looming. Feeling fast is way easier if you don’t care where you are heading. However, an unclear sense of direction is the worst while under pressure.

For additional examples and more details on tests themselves, see my last blog post and the references included in it:

How to use Test Driven Development in a Data Science Workflow

Let me know in the comments or on Twitter if this post helped you or if you want to add something. I’m also happy to connect on LinkedIn. Thanks for reading!

How to bridge Machine Learning and Software Engineering was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.