In a previous tutorial titled Image Upload from Android to a Python-based Flask Server, we created a project in which an Android app uploads an image to an HTTP server created using Flask in Python.
This tutorial extends on the previous project to classify that image in the Flask server using a pre-trained multi-class classification model and display the class label in an Android app. The model is a multilayer perceptron (MLP) model created using Keras, which is trained on the MNIST dataset.
The points covered in this tutorial are as follows:
- Preparing the MNIST Dataset
- Building the MLP
- Training the MLP
- Saving the Model
- Loading the Model and Making Predictions
- Editing the Flask Server to Classify Uploaded Images
The GitHub project for both the previous tutorial and this tutorial is available on this page.
The project has 2 folders. The first folder named Part 1 refers to the project of the previous tutorial. The second folder named Part 2 contains the new Flask server and the Python script for building the MLP that will be discussed in this tutorial. Let’s get started.
Get digestible bytes of mobile machine learning every week. All substance, no hype. Delivered straight to your inbox.
Preparing the MNIST Dataset
Before we start building the MLP, it’s essential to download and prepare the MNIST dataset. The MNIST dataset is a handwritten digits recognition dataset for digits 0 to 9. This means there are 10 different classes.
In this tutorial, Keras will be used for downloading such a dataset automatically. Keras has the module keras.datasets, which allows us to download certain datasets automatically, such as CIFAR10, CIFAR100, and MNIST. This module has a sub-module for each dataset. For example, the sub-module for the MNIST dataset is named mnist.
According to the code below, the keras.datasets.mnist module is imported. To download [if not downloaded previously] the dataset, the load_data() function is used. This function checks to see whether the dataset is downloaded or not. If not downloaded, then it will be downloaded automatically. If downloaded, it will be used directly.
This function returns a tuple of 2 elements. The first one holds 2 NumPy arrays representing training data inputs and labels—the second, 2 NumPy arrays representing test data inputs and labels. The 4 arrays returned are as follows:
- x_train: Training inputs.
- y_train: Training labels.
- x_test: Test inputs.
- y_test: Test labels.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
print(x_train.shape) # (60000, 28, 28)
print(y_train.shape) # (60000,)
print(x_test.shape) # (10000, 28, 28)
print(y_test.shape) # (10000,)
The MNIST dataset has 70,000 images—60,000 samples for training, and the remaining 10,000 for testing. Each single image is a gray image of size 28x28. Samples from the dataset are shown in the next figure.
When printing the shapes of the 4 arrays, the result is as shown below. For example, the shape of the training inputs array (x_train) is (60000, 28, 28), which means there are 60,000 samples and the shape of each sample is 28x28. If there are 60,000 samples for training, then for sure there will be 60,000 class labels in the y_train variable, one for each sample.
Each sample is assigned a single value representing its class label. For example, a sample assigned the number 3 means that the image has the digit 3.
- x_train: (60000, 28, 28)
- y_train: (60000,)
- x_test: (10000, 28, 28)
- y_test: (10000,)
Because we are building an MLP, the input arrays (train and test) can’t be left in their current form, because each layer in the MLP accepts a vector, not an array. The inputs can be left in the form of arrays if, for example, a convolutional neural network (CNN) is used.
So we need to convert each array into a vector. This is why the input arrays are reshaped using the reshape() function, as given in the code below. When converting the image with size 28x28 to a vector, there will be a total of 28x28=784 elements. The number 784 is passed to the reshape() function.
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
The data type of the images is uint8 (unsigned integer stored in 8 bits), and thus each pixel ranges from 0 to 255. We can scale the pixel values to be enclosed between 0 and 1. Rescaling is helpful because it reduces the gradients in the backpropagation phase from being large values. It also helps to select small values for the learning rate.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
According to the shape of the train and test labels, there’s a single value assigned for each sample. For multi-class classification, which is the target of this tutorial, each sample must be assigned a target binary vector of length equal to the number of classes.
Because there are 10 classes in the MNIST dataset, then each sample must be assigned a binary vector of length 10. Such a vector is all zeros except for a single element. The index of such element corresponds to the class label.
For example, if a sample has a class label 2, then a vector is created of 10 elements. All elements are zeros except for the element at index 2 which is 1. The binary vector is given below.
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
To convert all class labels to a binary vector form, the keras.utils.to_categorical() function is used.
num_classes = 10y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
The shapes of the y_train and y_test arrays are now (60000, 10) and (10000, 10), respectively. After doing this, the dataset is prepared for training the MLP. The complete code for loading and preparing the dataset is given below.
Building the MLP
Next, we’ll create our simple MLP in Keras to being trained on the MNIST dataset. The MLP is created as a sequential model using keras.models.Sequential(), as given in the code below.
The stack of layers added to the sequential model contains 3 dense and 2 dropout layers. The first dense layer accepts an input of shape equal to 784, which is the vectored image with 512 neurons. The last dense layer uses the softmax function with a number of neurons equal to the number of classes, which is 10, to return the probability of each class.
model = keras.models.Sequential()
model.add(keras.layers.Dense(512, activation='relu', input_shape=(784,)))
After building the model, we could print a summary using the model.summary() function. Its output is given below. It lists all layers under the Layer column alongside with the shape of their outputs in the Output Shape column.
None means that the first dimension of the output shape is determined based on the number of samples. For example, if 128 samples are fed to the MLP, then None is replaced by 512.
The Param column gives the total number of parameters in this layer. For example, the first layer has 401,920 parameters. This is calculated as the product of the number of inputs (784) and the number of neurons in the layer (512). The product 784*512 is equal to 401,408. Remember that there is a bias for each neuron, and thus additional 512 parameters added to the total to return 401,920.
At the end of the printed output, there’s a summary reflecting the total number of parameters, both trainable and non-trainable. Because we created the model from scratch, we need to train every parameter without excluding any. Thus, there are 0 non-trainable parameters.
Layer (type) Output Shape Param #
dense_1 (Dense) (None, 512) 401920
dropout_1 (Dropout) (None, 512) 0 _________________________________________________________________
dense_2 (Dense) (None, 512) 262656
dropout_2 (Dropout) (None, 512) 0
dense_3 (Dense) (None, 10) 5130
Total params: 669,706
Trainable params: 669,706
Non-trainable params: 0
The last thing to do before training the model is to define the loss function, optimizer, and metrics for training the model. This is done using the compile() function, as given below.
In this tutorial, the categorical cross-entropy function is used because we’re solving a multi-class problem. Note that to be able to use such a loss function, each sample must be assigned a target vector of length equal to the number of classes. This is what we did using the keras.utils.to_categorical() function.
The Root Mean Square Propagation (RMSprop) optimizer is used. There’s a single metric, the classification accuracy.
Training the MLP
After preparing the model, next we need to train it using the fit() function as given below. It accepts the training inputs (x_train), training labels (y_train), batch size (set to 128), number of epochs (set to 20), verbose=1 to print some messages while the model is being trained, and finally the validation data.
batch_size = 128
epochs = 20
history = model.fit(x_train, y_train,
After training completes on my machine, I got these accuracies for the training data after all 20 epochs. After epoch number 20, the accuracy for the training data is 99.5%.
0.9232, 0.9686, 0.9771, 0.9818, 0.9855, 0.9872, 0.9886, 0.9901, 0.9906, 0.9914, 0.9922, 0.9930, 0.9930, 0.9935, 0.9941, 0.9940, 0.9950, 0.9947, 0.9947, 0.9950
The epoch number versus the training accuracy is plotted in the next figure.
The accuracies for the validation data after all 20 epochs are given below. After the epoch number 20, the accuracy for the validation data is 98.41%.
0.9668, 0.9695, 0.9773, 0.9760, 0.9790, 0.9788, 0.9815, 0.9836, 0.9834, 0.9824, 0.9831, 0.9810, 0.9824, 0.9821, 0.9831, 0.9837, 0.9839, 0.9830, 0.9845, 0.9841
The epoch number versus validation accuracy is plotted in the next figure.
For evaluating the model with respect to the validation data, the evaluate() function is used, as given below. It will return the classification accuracy returned by the last epoch, which is 98.41%.
score = model.evaluate(x_test, y_test, verbose=0)
print('Test accuracy:', score)
Saving the Model
After model training is complete, don’t forget to save it for later use using the model.save() function. This saves the model in the same directory as the Python script.
The complete code required to prepare the data, build, train, and save the model is shown below.
Loading the Model and Making Predictions
After that, we can load the model and test it with some new samples. We load the model using the keras.models.load_model() function. For making predictions, the predict_classes() function is used. It accepts a NumPy array of samples, predicts their labels, and returns another NumPy array with the scores of each class.
In this example, the test sample is selected from the test data x_test[0, :] which is the first sample. Its class label is 7. When this sample is fed to the trained MLP, it’s able to make the correct prediction, as shown below.
loaded_model = keras.models.load_model('model.h5')
predicted_label = loaded_model.predict_classes(numpy.array([x_test[0, :]]))
After saving the model and making predictions on a few new samples, we need to now edit the Flask server created in the previous tutorial to classify the uploaded images and respond with the classification label.
Editing the Flask Server to Classify Uploaded Images
The same implementation of the Flask server developed in the previous tutorial will be used, but with an extension. The new code is listed below.
The extension is to allow the server to read the image as gray using the scipy.misc module, using the predict_classes() function to predict its class label using the Keras pre-trained model, and return the classification label as a string.
In the Android app, an image of the MNIST dataset (of shape 28x28) is selected and uploaded to the server. After making sure the server is active, click the “Connect to Server” button to upload the image and receive its class label. The next figure shows that the server responded by the label 7 after uploading an image with the digit 7.
Through 2 tutorials, here’s where we stand.
We’ve created an Android application that uploads an image to a server created using Flask in Python. At the server, there’s a pre-trained model that classifies images from the MNIST dataset. The uploaded image is then classified and the class label is returned to the Android app.
Next, rather than an MLP, we’ll build a convolutional neural network (CNN) in Keras, which is the state-of-the-art architecture for image recognition.
Machine learning is hard. But powering mobile apps with it doesn’t have to be. Easily teach your apps to see, hear, sense, and think with a free Fritz account.