A lot of people know how to build ML models, but surprisingly few are comfortable with the deployment process. Deploying models is a necessary skill in the industry, and the first step of deployment is running our ML models on the web during development for demos and testing. This can be done using a simple development server before deployment. A development server can be built with just a few lines of code using a framework like Flask. In this article, we will explore a very simple development server which will enable us to render the output predictions of our image classifier model (written using PyTorch) on a public webpage. This entire code can be run on Google Colab, without installing anything on our local machine. 

Google Colab provides a virtual machine environment, so unlike when running Flask on our local machine, we cannot access localhost. Hence, we need to expose it to a public URL using the library flask-ngrok, which generates a temporary URL where the web app runs.

First, let’s create two folders called images and references in the current project directory, where we will store images (to be classified by our classifier) and a dictionary mapping class indices to human-understandable class names.

 import os
 IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, 'images')
 os.makedirs(IMAGES_PATH, exist_ok=True)
 REFERENCE_FILES_PATH = os.path.join(PROJECT_ROOT_DIR, 'references')
 os.makedirs(REFERENCE_FILES_PATH, exist_ok=True) 

Next, let’s install ngrok, which will help us with the temporary url generation.

!pip install flask-ngrok

After installing Flask, let’s create a web app, called app. When we call run_with_ngrok(), ngrok will start when app is run, creating the temporary URL. 

app = Flask(__name__)

Now, let’s upload the {class index: [class ID, class name]} dictionary into the references folder we created. The link for the document is available here:

imagenet_class_index = json.load(open(os.path.join(REFERENCE_FILES_PATH,

Next, let’s use PyTorch’s torchvision package to create a classifier from one of its pre-trained models (densenet121). Then, we define a pipeline that takes in an image in bytes and resizes it, centre crops it, converts it to a tensor, and then normalizes the result, to be fed into the model. Note that we should load the model only once, before serving requests,  so that it is not unnecessarily loaded multiple times which would be a waste of computational power. This is very important in production systems.

 model = models.densenet121(pretrained=True)
 def transform_image(image_bytes):
    my_transforms = transforms.Compose([transforms.Resize(255),
                                            [0.485, 0.456, 0.406],
                                            [0.229, 0.224, 0.225])])
    image = Image.open(io.BytesIO(image_bytes))
    return my_transforms(image).unsqueeze(0) 

We can test transform_image by running it on a test image; the output should be a tensor. A tensor is like a matrix of N dimensions: all neural networks expect inputs to be tensors.

 with open(os.path.join(IMAGES_PATH, 'test_pic.jpg'), 'rb') as f:
    image_in_bytes = f.read()
    tensor = transform_image(image_bytes=image_in_bytes)

Next, let’s define the get_prediction method which takes in the image in bytes, transforms it to a tensor using the pipeline defined in the previous function, and then performs forward propagation using the model and returns the predicted class name. The model outputs the integer class index (0,1,2…) which has to be converted to the string key to get the corresponding string class index and name from imagenet_class_index.

 def get_prediction(image_bytes):
    tensor = transform_image(image_bytes=image_bytes)
    outputs = model.forward(tensor)
    _, y_hat = outputs.max(1)
    predicted_idx = str(y_hat.item())
    return imagenet_class_index[predicted_idx] 

We can test the above function by calling it with our test image. The image should return a class index and name as shown below.

 with open(os.path.join(IMAGES_PATH, 'test_pic.jpg'), 'rb') as f:
  image_in_bytes = f.read()

The test_pic used was a girl in a skirt, so the model is quite right! Next, let’s define web app functions. We’ll define the home/root web page at index (‘/’) and another page (‘/predict’) where the JSON output should be rendered.

 def root():
    return 'Root of Flask WebApp!'
 def predict():
    file_doc = open(os.path.join(IMAGES_PATH, 'wolf.jpg'),'rb')
    img_bytes = file_doc.read()
    class_id, class_name = get_prediction(image_bytes=img_bytes)
    return jsonify({'class_id': class_id, 'class_name': class_name}) 

Hence, the home page should say ‘Root of Flask WebApp!’ and the /predict page should give us our model’s output for the input image, in this case, a wolf. Let’s run the web app with the following command:


This means ngrok is exposing our web app onto the temporary URL http://be8dd2d747fc.ngrok.io (it is different every time it is run). If we go to this URL we get our home page:

And if we go to the ‘/predict’ domain, we get our model’s prediction.

The model is right, and its JSON output is successfully being rendered on a public webpage! That’s it, we have now successfully built a development server and deployed our model on the web. This is a great way to give a demo of our web app before deploying it into production. Nowadays more and more people are using Google Colab because of its low overhead and easy access to GPUs, and now you too know how to create and test ML web apps on Google Colab from scratch!

For the complete running code, please refer to this link.

The post How To Run A Development Server For Flask Web Applications Using Google Colab appeared first on Analytics India Magazine.