[Nearly] Everything you need to know in 2019
Data augmentation involves the process of creating new data points by manipulating the original data. For example, for images, this can be done by rotating, resizing, cropping, and more.
This process increases the diversity of the data available for training models in deep learning without having to actually collect new data. This then, generally speaking, improves the performance of deep learning models.
In this guide, we’ll look at a couple of papers and research efforts that have been put forth to tackle this challenge.
Random Erasing Data Augmentation (2017)
Random Erasing is a data augmentation method used for training convolutional neural networks that involves randomly erasing a rectangular region in an image. Images with occlusions are then generated. This makes a model robust to occlusion and reduces the chances of overfitting.
During occlusion, the structure of the image is maintained; hence, no information is lost during augmentation. Pixels of the erased regions are re-assigned random values. This method is similar to applying dropout at an image level. Random Erasing does not require parameter learning, is lightweight, and is low on memory consumption. This method is evaluated on CIFAR-10, CIFAR-100, and the Fashion-MNIST.
The architectures adopted in this paper are ResNet, ResNeXt, and Wide Residual Networks. The performance of the method on various datasets is shown below.
Machine learning models are moving closer and closer to edge devices. Fritz AI is here to help with this transition. Explore our suite of developer tools that makes it easy to teach devices to see, hear, sense, and think.
AutoAugment: Learning Augmentation Strategies from Data (CVPR 2019)
AutoAugment is an augmentation strategy that employs a search algorithm to find an augmentation policy that will yield the best results on the model. Each policy has several sub-policies. One sub-policy is randomly chosen for each image. Each sub-policy consists of an image processing function and the probability that the functions are applied with. The image processing operations could be translation, shearing or rotation. The best policy is the one that yields the highest validation accuracy via the search algorithm.
During experimentation, reinforcement learning is used for the search algorithm. Learned policies are easily transferrable to new datasets. AutoAugment was tested on CIFAR-10, CIFAR-10, CIFAR-100, SVHN, reduced SVHN, and ImageNet.
The method has two components: a search algorithm, and a search space. The search algorithm is implemented as a controller RNN. It samples a data augmentation policy, which has the image processing operation information and the probability of using the operation in each batch. It also has information about the magnitude of the operation. This policy will then be used to train a neural network with a fixed architecture. The validation accuracy obtained from this will be sent back to update the controller, which is updated by the policy gradient methods.
In the search space, a policy consists of 5 sub-policies. Each sub-policy has two image operations that are applied in sequence. Every operation is linked to two hyperparameters. These are the probability of applying the operation and the magnitude of the operation.
Here are some of the results obtained with this method:
Fast AutoAugment (2019)
The Fast AutoAugment algorithm finds effective augmentation policies using a search strategy based on density matching. AutoAugment doesn’t require repeated training of child models—instead, it searches for augmentation policies that maximize the match between the distribution of augmented split and the distribution of another unaugmented split via a single model.
The method improves the generalization performance of a network by learning the augmentation policies that treat augmented data as missing data points of training data. The method recovers the missing data points by exploiting and exploring the family of inference-time augmentations through Bayesian optimization in the policy search phase.
Here are some of the results obtained using this method:
Learning Data Augmentation Strategies for Object Detection (2019)
While this isn’t itself a model architecture, this paper proposes the creation of transformations that can be applied to object detection datasets that can be transferred to other objection detection datasets. The transformations are usually applied at training time.
In this model, an augmentation policy is defined as a set of n policies that are selected at random during the training process. Some of the operations that have been applied in this model include distorting color channels, distorting the images geometrically, and distorting only the pixel content found in the bounding box annotations.
Experimentation on the COCO dataset has shown that optimizing a data augmentation policy is able to improve the accuracy of detection by more than +2.3 mean average precision. This allows a single inference model to achieve an accuracy of 50.7 mean average precision.
A newsletter for machine learners — by machine learners. Sign up to receive our weekly dive into all things ML, curated by our experts in the field.
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition (Interspeech 2019)
SpecAugment is a data augmentation method for speech recognition that’s applied directly to the feature inputs of a neural network. It involves warping the features, masking blocks of frequency channels, and masking blocks of time steps. It’s used in end-to-end speech recognition tasks. On LibreSpeech, the method achieves a 6.8% WER.
SpecAugment operates on the log mel spectrogram of the input audio. The method is computationally easy to apply because it directly acts on the spectrogram as if it were an image. It doesn’t require additional data.
SpecAugment is made up of three deformations of the spectrogram. These are time warping and time and frequency masking. Time warping involves a deformation of the time-series in the time direction. In time and frequency masking, a block of consecutive time steps or mel frequency channels are masked. This method enables the training of end-to-end ASR networks known as Listen Attend and Spell (LAS).
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks (EMNLP-IJCNLP 2019)
EDA is made up of the following operations: synonym replacement, random insertion, random swap, and random deletion.
Synonym replacement involves randomly selecting words that are not stop-words and replacing them with a random synonym. Random insertion involves getting a random synonym of a random word and inserting that synonym into a random position in the sentence. Random swap involves randomly choosing two words in the sentence and swapping their positions. Random deletion involves randomly removing each word in the sentence that has a certain probability.
For experimentation, LSTM-RNNs and convolutional neural networks are used. Here are the results obtained:
Unsupervised Data Augmentation for Consistency Training (2019)
This paper proposes a way to add noise to unlabeled data. The quality of noising is very fundamental in semi-supervised learning. The paper explores noise injection in consistency training and determines that advanced data augmentation methods also perform well in semisupervised learning. Consistency training methods regularize model predictions to be invariant to small noise to input data or hidden states.
The authors propose the substitution of traditional noise injection methods with high-quality data augmentation methods in order to improve consistency training. Some of the augmentation strategies used in this paper are RandAugment for Image Classification, Back-translation for Text Classification, and Word replacing with TF-IDF for Text Classification. RandAugment doesn’t use search but uniformly samples from the same set of augmentation transformations in PIL.
Back-translation involves translating an existing example in language A into another language B, and then translating it back to A to obtain the augmented sample. The authors used back-translation to paraphrase the training data for classification texts.
For word replacing with TF-IDF, uninformative words with low TF-IDF scores are replaced, while those with high TF-IDF values are kept.
Here’s how the method performs:
We should now be up to speed on some of the most common — and a couple of very recent — data augmentation methods.
The papers/abstracts mentioned and linked to above also contain links to their code implementations. We’d be happy to see the results you obtain after testing them.
Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to exploring the emerging intersection of mobile app development and machine learning. We’re committed to supporting and inspiring developers and engineers from all walks of life.
Editorially independent, Heartbeat is sponsored and published by Fritz AI, the machine learning platform that helps developers teach devices to see, hear, sense, and think. We pay our contributors, and we don’t sell ads.
If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and Heartbeat), join us on Slack, and follow Fritz AI on Twitter for all the latest in mobile machine learning.