Tensorflow 2.0 — Create and Train a Vanilla CNN on Google Colab
An Introduction to Colab and Tensorflow 2.0
Tensorflow 2.0 came out less than a week ago and it has been the most discussed topic on my twitter TL, so I decided to join the buzz and write this article. There have been some major improvements to Tensorflow and at the same time making it easy for anyone to start with it. Tensorflow 2.0 is now highly integrated with Keras which makes it easier to build and test simple models with least expertise. Google has also decided to emphasize on “Eager execution” instead of building session-based models. Eager execution allows your python operations to be evaluated immediately instead of building a computation graph. This article will be a step by step tutorial on how to use Google Colab and build a CNN model in Tensorflow 2.0
For those of you who are not aware of what Google Colab is(if you are, you can skip the next few lines), it is an online Jupyter Notebook that lets you write and share code. The best part is that the code executes on servers of Google. You can even choose to train your models on Google’s cloud GPU or TPU.
To train your model on GPU or TPU, go to Runtime → Change Runtime Type → Hardware Accelerator
The dataset we will be working with is the German Traffic Sign Recognition Benchmark. The dataset has over 50K images with over 40 classes of traffic signs.
Let’s get started with the code
Playing with our data
The first part is importing your dataset into Google Colab. You can do this by uploading your data into your google drive and mounting the drive in your Colab notebook. After uploading your dataset as a .zip file into your drive, type the code below in your Colab notebook to mount the drive.
You can unzip the file using the shell command unzip. Shell commands can be invoked in the notebook cells by preceding the command with a ‘!’. Take a look at the example below.
The dataset is already split into training and testing images with 75% of images used for training and rest for testing our model. Since we have mounted our drive, we can now access the dataset by referencing the path in the drive. Have a look at the code below to load training and testing data.
We use PIL to load the images from the directory and since the images are of different dimensions, we use im.resize() to resize each image to a standard dimension of 100x100. We now convert train_images, train_labels and test_images from lists to numpy arrays. The numpy array ‘train_labels’ has only a single entry in each row so we reshape it using the code below.
Note: ‘-1’ represents an unknown dimension
Now that we have reshaped our training labels, we can convert them into a one-hot encoding. To understand what one-hot encoding is and why we do it, click here.
Scikit-learn has a pre-defined function which we can import directly to create our one-hot encoded labels. We now normalize our images by dividing them with 255.
Dividing the images by 255.0 reduces the range of pixel values in each image to 0.0–1.0, this provides better results from our model.
Building our model
Before we start coding out our model, check whether our Colab is using Tensorflow 2.0 as the backend, we can do that by typing out the following command.
If an earlier version of Tensorflow is being used as the backend, we can upgrade it by typing out the following command in the notebook
We can now start out by doing necessary imports and creating batches of tensors from our training data.
Tensorflow’s dataset library(tf.data) has been expanded in version 2.0 and has grown more sophisticated with new additions.
- from_tensor_slices() function takes in numpy arrays as arguments and generates tensor objects.
- shuffle() takes buffer_size as an argument and randomly samples elements from the buffer.
- batch() functions takes batch_size as an argument and conflates consecutive elements from data into batches equal to batch_size
Remember that Tensorflow 2.0 has more emphasis on “Eager execution”, avoiding computation graphs. Therefore the operation is evaluated once the cell is executed. Hence, train_ds has two tensors that are randomly sampled and batched. These tensors represent the training images and labels.
Let us start coding architecture of the model.
We create our model class(MyModel) as a derived class from Model. This saves us a lot of time from writing our own Model class. The architecture we will be using is a simple CNN with dense layers for class prediction. To know more about CNNs kindly click here. After defining our model architecture we create an object for our model and move on to define our loss functions, optimizer and metrics.
- train_loss will be the mean value of all losses for each epoch
- train_accuracy will the accuracy measure of our model for each epoch
The above function is used to train our model. It takes in the images and respective labels, computes the loss and gradients.
- As Tensorflow 2.0 uses ‘Eager execution’ there might be a setback on performance and deployability. To ensure great performance and pervasive deployability, we can add a decorator @tf.function, this decorator converts the function into a graph.
- tf.GradientTape() is a high-level API that is used to compute differentiations. In the above set of lines in the code, we calculate the loss from our true labels and predicted label and use gradient tape to compute gradients(differentiation) of loss with respect to our model variables and apply them to our optimizer
- We also calculate training loss and accuracy in each step
Now that we have done all the necessary steps to build our model, we start training it but executing the code below.
We training our model for 5 epochs and save the weights of our model after each epoch. Note that the model weights will be saved in the Google drive. We also reset our training loss and accuracy values for each epoch.
To load the weights of the model, create an instance of the MyModel class and use load_weights(path) function.
Predicting on Test Set
We can obtain the model predictions by providing the test_images as a parameter and since the model returns a probability distribution we use np.argmax() to get the highest value.
By following the above steps you have successfully trained a CNN on Colab using Tensorflow 2.0. Kindly reach out to me if you have any doubts :)
Link to full code: https://github.com/grohith327/traffic_sign_detection