The FASTEST state-of-the-art algorithm for series classification with Python

By Joe Schneid at Wikimedia Commons

Most state-of-the-art (SOTA) time series classification methods are limited by high computational complexity. This makes them slow to train on smaller datasets and effectively unusable on large datasets.

Recently, ROCKET (RandOM Convolutional KErnel Transform) has achieved SOTA of accuracy in just a fraction of the time as other SOTA time series classifiers. ROCKET transforms time series into features using random convolutional kernels and passes the features to a linear classifier.

MiniRocket is even faster!

MiniRocket (MINImally RandOm Convolutional KErnel Transform) is a (nearly) deterministic reformulation of Rocket that is 75 times faster on larger datasets and boasts roughly equivalent accuracy.

MiniRocket is the new default variant of Rocket

On the 108 datasets in the UCR archive, Rocket ran in roughly 2 hours on a single CPU core. MINIROCKET took only 8 minutes to run! For comparison, the next fastest SOTA algorithm (cBOSS) took approximately 14 hours.

Quick Intro to Rocket

At its core, Rocket is a a method for time series transformation, or feature extraction. The extracted features contain information related to series class membership, which can be modeled by a linear classifier.

Rocket transforms time series by first convolving each series with 10,000 random convolutional kernels. The random convolutional kernels have random length, weights, bias, dilation, and padding.

Then Rocket separately applies global max pooling and PPV “proportion of positive values” pooling to the convolutional output to produce 2 features per kernel, one feature for each pooling method. This results in 20,000 features per input time series.

The proportion of positive values indicates how to weight the prevalence of a pattern captured by the kernel. This value is the most critical element of ROCKET and contributes to its high accuracy.

zi is the output of the convolution operation

The features extracted by Rocket are then used to train a linear classifier.

For more background on Rocket, see my earlier article:

ROCKET: Fast and Accurate Time Series Classification

How is MiniRocket Different from Rocket?

Both MiniRocket and Rocket rely on pooling convolution values using PPV, or proportion of positive values, of the convolution results. As a quick refresher, you can define PPV as:

PPV = proportion of values that are greater than 0, after convolving X with weights W and subtracting the bias b.

Kernel hyperparameters

MiniRocket uses a small, fixed set of convolutional kernels instead of kernels with random hyperparameters.

In general, MiniRocket minimizes the number of hyperparameter options for each kernel. The number of MiniRocket kernels are kept as small as possible, while still maintaining accuracy, in order to maximize computational efficiency.

Length Hyperparameter

Rocket randomly selects kernel length from the values 7, 9, and 11; MiniRocket only uses kernels of length 9. This also means series must have length 9 or greater or padded to have length of at least 9.

Weight Hyperparameter

Rocket randomly selects weights from a Normal distribution between 0 and 1; MiniRocket restricts the weights values to either -1 or 2. MiniRocket further restricts its set of kernels to those that have exactly 3 values of 2 (e.g. [1, 1, 1, 1, 1, 1, 2, 2, 2]).

The fact that MiniRocket uses exactly 2 fixed values for its weight parameters is critical to speed optimizations

The exact value and scale of the two selected weight values is not important, so long as the kernel weights sum to 0. This ensures that the kernels are sensitive to only the relative magnitude of the input values.

Thus, MiniRocket uses a small, fixed set of 84 kernels and is almost entirely deterministic.

Bias Hyperparameter

Rocket randomly selects bias from a Uniform distribution between -1 and 1; MiniRocket simply samples bias values from the convolution output.

For each convolutional kernel, the bias value is drawn from the quantiles of the convolutional output from one random training example.

This is the only random component of MiniRocket.

Dilation Hyperparameter

Dilation allows a smaller kernel to be applied to a wider window of the data.

[1511.07122v2] Multi-Scale Context Aggregation by Dilated Convolutions (arxiv.org)

Rocket randomly selects a value for dilation; MiniRocket constrains the maximum number of dilations per kernel to 32. Larger dilation values make the transform less efficient and do not improve accuracy.

Pooling of convolutional output

Both Rocket and MiniRocket apply PPV pooling to the convolutional output to generate output features.

Rocket additionally applies global max pooling to the convolutional output to generate additional features. MiniRocket does not use global pooling because it does not improve accuracy given the other changes made. As a result, MiniRocket generates half as many features as Rocket and maintains equivalent accuracy.

How is MiniRocket Faster than Rocket?

MiniRocket significantly speeds up the transform through 4 key optimizations, made possible by the properties of the small, fixed set of kernels and by PPV.

Compute PPV for W and −W at the same time

This optimization takes advantage of the mathematical properties of the fixed kernels and PPV.

The basic idea is that PPV = 1-PNV (proportion of negative values). If you calculate PPV for a kernel, you get PNV “for free”, without running an additional convolution. Since PNV is just the PPV for the inverted kernel, this allows MiniRocket to double the number of kernels applied to a series without increasing the number of computations.

Practically, once you compute PPV for the set of kernels with weights α = −1 and β = 2, then you can quickly compute PNV = 1-PPV. The PNV is the PPV for the inverted kernels with weights α = 1 and β = −2.

Reuse the convolution output to compute multiple features

For smaller dilations, the same kernel is used to compute multiple features, such as different bias values. Thus time is reduced by re-using the convolution output from the same kernel for multiple features.

Avoid multiplications in the convolution operation

Addition operations are faster than multiplication operations. Because the kernel weights are restricted to 2 values, the most of the multiplication operations can be factored out mathematically and replaced with addition.

The core intuition is this: αX and βX only need to be computed once per input series because there are no other weight values. These values can then be reused to complete the convolution operation with addition alone.

For each dilation, compute all kernels (almost) “at once”

This optimization uses clever math to further precompute and reuse convolutional outputs. The basic idea is instead of separately computing α*X and β*X for each dilation, compute (α+β)*X in a single step.

How to use MiniRocket with Python?

Like Rocket, the MiniRocket transform is implemented in the sktime python package.

Sktime: a Unified Python Library for Time Series Machine Learning

The following code example is adapted from the sktime MiniRocket Demo.

First, load the required packages.

import numpy as np
from sklearn.linear_model import RidgeClassifierCV
from sktime.datasets import load_arrow_head # univariate dataset
from sktime.transformers.series_as_features.rocket import MiniRocket

Next set up the training and test data — in this case, I load the univariate ArrowHead series dataset. Note that input series must be of length 9 or greater. If your series is shorter, pad the series with the PaddingTransformer function in sktime.

A benefit of MiniRocket is that it is not necessary to normalize the input series, unlike with Rocket.

X_train, y_train = load_arrow_head(split="test", return_X_y=True)
X_test, y_test = load_arrow_head(split="train", return_X_y=True)
print(X_train.shape, X_test.shape)
>> (175, 1) (36, 1)

Transform the training data using the MiniRocket transform. MiniRocket uses 10,000 convolutional kernels to create 10,000 features.

minirocket = MiniRocket() 
minirocket.fit(X_train)
X_train_transform = minirocket.transform(X_train)
X_train_transform.shape
>> (175, 10000)

Initialize and train a linear classifier from scikit-learn. The authors recommend using RidgeClassifierCV for smaller datasets (<20k training examples). For larger datasets, use logistic regression trained with stochastic gradient descent SGDClassifier(loss='log').

classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10), normalize=True)
classifier.fit(X_train_transform, y_train)

Finally, to score the trained model and generate predictions, transform the test data using Rocket and call the trained model.

X_test_transform = minirocket.transform(X_test)
classifier.score(X_test_transform, y_test)
>> 0.9167

For multivariate series, the steps are the same, but use the MiniRocketMultivariate class instead of MiniRocket.

Related Work

Thank you for reading! You might also enjoy these related articles:

References

[2012.08791] MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification (arxiv.org)

GitHub — angus924/minirocket: MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

sktime/minirocket.ipynb at master · alan-turing-institute/sktime · GitHub


MiniRocket: Fast(er) and Accurate Time Series Classification was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.