diff --git a/examples/classification/classification.ipynb b/examples/classification/classification.ipynb index 99bf4bf2a6..8ec2f4563b 100644 --- a/examples/classification/classification.ipynb +++ b/examples/classification/classification.ipynb @@ -1,678 +1,931 @@ { - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Time Series Classification\n", - "\n", - "Time Series Classification (TSC) involves training a model from a collection\n", - " of time series (real valued, ordered, data) in order to predict a discrete target\n", - " variable. For example, we might want to build a model that can predict whether a patient\n", - " is sick based on their ECG reading, or a persons type of movement based on the trace\n", - " of the position of their hand. This notebook gives a quick guide to TSC to get you\n", - " started using aeon time series classifiers. If you can use scikit-learn, it should\n", - " be easy, because the basic usage is identical.\n", - "\n", - "\"time" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "## Classification Notebooks\n", - "\n", - "This note book gives an overview of TSC. More specific notebooks on TSC are base on\n", - "the type of representation or transformation they use:\n", - "\n", - "- [Convolution based](convolution_based.ipynb)\n", - "- [Deep learning](deep_learning.ipynb)\n", - "- [Dictionary based](dictionary_based.ipynb)\n", - "- [Distance based](distance_based.ipynb)\n", - "- [Feature based](feature_based.ipynb)\n", - "- [Interval based](interval_based.ipynb)\n", - "- [Shapelet based](shapelet_based.ipynb)\n", - "- [Hybrid](hybrid.ipynb)\n", - "- [Early classification](early_classification.ipynb)\n" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "## Data Storage and Problem Types\n", - "\n", - "Time series can be univariate (each observation is a single value) or multivariate\n", - "(each observation is a vector). For example, an ECG reading from a single\n", - "sensor is a univariate series, but a motion trace of from a smart watch would be\n", - "multivariate, with at least three dimensions (x,y,z co-ordinates). The image above is\n", - " a univariate problem: each series has its own label. The dimension of the time\n", - " series instance is also often called the channel. We recommend storing time series\n", - " in 3D numpy array of shape `(n_cases, n_channels, n_timepoints)` and\n", - " where possible our single problem loaders will return a\n", - " 3D numpy. Unequal length classification problems are stored in a list of 2D numpy\n", - " arrays. More details on data storage can be found in the [data storage](../datasets/datasets.ipynb) notebook." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": 1, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ArrowHead series of type and shape (36, 1, 251)\n", - "Motions type of shape (40,)\n" - ] - } - ], - "source": [ - "# Plotting and data loading imports used in this notebook\n", - "import matplotlib.pyplot as plt\n", - "\n", - "from aeon.datasets import load_arrow_head, load_basic_motions\n", - "\n", - "arrow, arrow_labels = load_arrow_head(split=\"train\")\n", - "motions, motions_labels = load_basic_motions(split=\"train\")\n", - "print(f\"ArrowHead series of type {type(arrow)} and shape {arrow.shape}\")\n", - "print(f\"Motions type {type(motions)} of shape {motions_labels.shape}\")" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "We use 3D numpy even if the data is univariate: even though classifiers\n", - "can work using a 2D array of shape `(n_cases, n_timepoints)`, this 2D shape can get\n", - "confused with single multivariate time series, which are of shape `(n_channels, n_timepoints)`.\n", - "Hence, to differentiate both cases, we enforce the 3D format `(n_cases, n_channels,\n", - "n_timepoints)` to avoid any confusion.\n", - "\n", - "If your series are unequal length, have missing values or are\n", - " sampled at irregular time intervals, you should read the note book\n", - " on [data preprocessing](../transformations/preprocessing.ipynb).\n", - "\n", - "The [TSC dataset archive](https://timeseriesclassification.com/) contains a\n", - "large number of example TSC problems that have been used thousands of times in the\n", - "literature to assess TSC algorithms. These datasets have certain characteristics that\n", - "influence what data structure we use to store them in memory.\n", - "\n", - "Most datasets in the archive contain time series all the same length. For example,\n", - "the [ArrowHead dataset](https://timeseriesclassification.com/description.php?Dataset=ArrowHead) we have just loaded consists of outlines of the images of\n", - "arrow heads. The classification of projectile points is an important topic in anthropology.\n", - "\n", - "\"arrow\n", - "\n", - "The shapes of the projectile points are converted into a sequence using the\n", - "angle-based method as described in this [blog post](https://izbicki.me/blog/converting-images-into-time-series-for-data-mining.html) about converting images into time series for data mining.\n", - "\n", - "\"from\n", - "\n", - "Each instance consists of a single time series (i.e. the problem is univariate) of\n", - "equal length and a class label based on shape distinctions such as the presence and\n", - "location of a notch in the arrow. The data set consists of 210 instances, by default split into 36 train and 175 test instances.\n", - "\n", - "The [BasicMotions dataset](https://timeseriesclassification.com/description.php?Dataset=BasicMotions) is an example of a multivariate TSC problem. It was generated\n", - " as part of a project where four students performed four activities whilst wearing a\n", - " smartwatch. The watch collects 3D accelerometer and 3D gyroscope data. Each instance\n", - " involved a subject performing one of four tasks (walking, resting, running and\n", - " badminton) for ten seconds. Time series in this data set have six dimensions or\n", - " channels." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "text/plain": "[]" - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "text/plain": "
", - "image/png": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "plt.title(\n", - " f\"First and second dimensions of the first instance in BasicMotions data, \"\n", - " f\"(student {motions_labels[0]})\"\n", - ")\n", - "plt.plot(motions[0][0])\n", - "plt.plot(motions[0][1])" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "text/plain": "[]" - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "text/plain": "
", - "image/png": "" - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "plt.title(f\"First instance in ArrowHead data (class {arrow_labels[0]})\")\n", - "plt.plot(arrow[0, 0])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It is possible to use a standard `sklearn` classifier for univariate, equal length\n", - "classification problems, but it is unlikely to perform as well as bespoke time series\n", - " classifiers, since `sklearn` classifiers ignore the sequence information in the variables.\n", - "\n", - "To apply `sklearn` classifiers directly, the data needs to be reshaped into a 2D\n", - "numpy array. We also offer the ability to load univariate TSC problems directly in 2D\n", - " arrays although we recommend using 3D numpy of shape `(n_channels, 1, n_timepoints)\n", - " ` for univariate collections." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "outputs": [ - { - "data": { - "text/plain": "0.7028571428571428" - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from sklearn.ensemble import RandomForestClassifier\n", - "from sklearn.metrics import accuracy_score\n", - "\n", - "rand_forest = RandomForestClassifier(n_estimators=100)\n", - "arrow2d = arrow.squeeze()\n", - "arrow_test, arrow_test_labels = load_arrow_head(split=\"test\", return_type=\"numpy2d\")\n", - "rand_forest.fit(arrow2d, arrow_labels)\n", - "y_pred = rand_forest.predict(arrow_test)\n", - "accuracy_score(arrow_test_labels, y_pred)" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "## Time Series Classifiers in aeon\n", - "\n", - "`aeon` contains the state of the art in time series classifiers in the package\n", - "`classification`. These are grouped based on the data representation used to find\n", - "discriminatory features. We provide a separate notebook for each of type:\n", - "[convolution based](convolution_based.ipynb), [deep learning](deep_learning.ipynb), [distance based](distance_based.ipynb), [dictionary based](dictionary_based.ipynb),\n", - "[feature_based](feature_based.ipynb), [hybrid](hybrid.ipynb), [interval based](interval_based.ipynb), and [shapelet based](shapelet_based.ipynb). We also\n", - "provide some\n", - "standard classifiers not available in scikit learn in the sklearn package.\n", - "We show the simplest use cases for classifiers and demonstrate how to build bespoke\n", - "pipelines for time series classification. An accurate and relatively\n", - "fast classifier is the [ROCKET](https://link.springer.com/article/10.1007/s10618-020-00701-z) classifier. ROCKET is a convolution based algorithm\n", - "described in detail in the [convolution based](convolution_based.ipynb) note book." - ] - }, - { - "cell_type": "code", - "metadata": { - "collapsed": false, - "ExecuteTime": { - "end_time": "2024-11-16T19:16:46.486243Z", - "start_time": "2024-11-16T19:15:42.973051Z" - } - }, - "source": [ - "from aeon.classification.convolution_based import RocketClassifier\n", - "\n", - "rocket = RocketClassifier(n_kernels=2000)\n", - "rocket.fit(arrow, arrow_labels)\n", - "y_pred = rocket.predict(arrow_test)\n", - "\n", - "accuracy_score(arrow_test_labels, y_pred)" - ], - "outputs": [ - { - "ename": "NameError", - "evalue": "name 'arrow' is not defined", - "output_type": "error", - "traceback": [ - "\u001B[1;31m---------------------------------------------------------------------------\u001B[0m", - "\u001B[1;31mNameError\u001B[0m Traceback (most recent call last)", - "Cell \u001B[1;32mIn[1], line 4\u001B[0m\n\u001B[0;32m 1\u001B[0m \u001B[38;5;28;01mfrom\u001B[39;00m \u001B[38;5;21;01maeon\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mclassification\u001B[39;00m\u001B[38;5;21;01m.\u001B[39;00m\u001B[38;5;21;01mconvolution_based\u001B[39;00m \u001B[38;5;28;01mimport\u001B[39;00m RocketClassifier\n\u001B[0;32m 3\u001B[0m rocket \u001B[38;5;241m=\u001B[39m RocketClassifier(n_kernels\u001B[38;5;241m=\u001B[39m\u001B[38;5;241m2000\u001B[39m)\n\u001B[1;32m----> 4\u001B[0m rocket\u001B[38;5;241m.\u001B[39mfit(\u001B[43marrow\u001B[49m, arrow_labels)\n\u001B[0;32m 5\u001B[0m y_pred \u001B[38;5;241m=\u001B[39m rocket\u001B[38;5;241m.\u001B[39mpredict(arrow_test)\n\u001B[0;32m 7\u001B[0m accuracy_score(arrow_test_labels, y_pred)\n", - "\u001B[1;31mNameError\u001B[0m: name 'arrow' is not defined" - ] - } - ], - "execution_count": 1 - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "A slower but generally more accurate classifier for time series classification is\n", - "version 2 of the [HIVE-COTE](https://link.springer.com/article/10.1007/s10994-021-06057-9) algorithm.\n", - "(HC2) is described in the [hybrid notebook](hybrid.ipynb) notebook. HC2 is particularly\n", - "slow\n", - "on small problems like these examples. However, it can be\n", - "configured with an approximate maximum run time as follows (it may take a bit longer\n", - "than 12 seconds to run this cell, very short times are approximate since there is a\n", - "minimum amount of work the classifier needs to do):" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "text/plain": "0.8685714285714285" - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from aeon.classification.hybrid import HIVECOTEV2\n", - "\n", - "hc2 = HIVECOTEV2(time_limit_in_minutes=0.2)\n", - "hc2.fit(arrow, arrow_labels)\n", - "y_pred = hc2.predict(arrow_test)\n", - "\n", - "accuracy_score(arrow_test_labels, y_pred)" - ] - }, - { - "cell_type": "markdown", - "source": [], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "## Multivariate Classification\n", - "To use ``sklearn`` classifiers directly on multivariate data, one option is to flatten\n", - "the data so that the 3D array `(n_cases, n_channels, n_timepoints)` becomes a 2D array\n", - "of shape `(n_cases, n_channels*n_timepoints)`." - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": 7, - "outputs": [ - { - "data": { - "text/plain": "0.925" - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "motions_test, motions_test_labels = load_basic_motions(split=\"test\")\n", - "motions2d = motions.reshape(motions.shape[0], motions.shape[1] * motions.shape[2])\n", - "motions2d_test = motions_test.reshape(\n", - " motions_test.shape[0], motions_test.shape[1] * motions_test.shape[2]\n", - ")\n", - "rand_forest.fit(motions2d, motions_labels)\n", - "y_pred = rand_forest.predict(motions2d_test)\n", - "accuracy_score(motions_test_labels, y_pred)" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "However, many ``aeon`` classifiers, including ROCKET and HC2, are configured to\n", - "work with multivariate input. This works exactly like univariate classification. For example:" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": 8, - "outputs": [ - { - "data": { - "text/plain": "1.0" - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "rocket.fit(motions, motions_labels)\n", - "y_pred = rocket.predict(motions_test)\n", - "accuracy_score(motions_test_labels, y_pred)" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "source": [ - "A list of classifiers capable of handling multivariate classification can be obtained\n", - " with this code" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": 9, - "outputs": [ - { - "data": { - "text/plain": "[('Arsenal', aeon.classification.convolution_based._arsenal.Arsenal),\n ('CNNClassifier', aeon.classification.deep_learning.cnn.CNNClassifier),\n ('CanonicalIntervalForestClassifier',\n aeon.classification.interval_based._cif.CanonicalIntervalForestClassifier),\n ('Catch22Classifier',\n aeon.classification.feature_based._catch22.Catch22Classifier),\n ('ChannelEnsembleClassifier',\n aeon.classification.compose._channel_ensemble.ChannelEnsembleClassifier),\n ('DrCIFClassifier',\n aeon.classification.interval_based._drcif.DrCIFClassifier),\n ('DummyClassifier', aeon.classification._dummy.DummyClassifier),\n ('ElasticEnsemble',\n aeon.classification.distance_based._elastic_ensemble.ElasticEnsemble),\n ('EncoderClassifier',\n aeon.classification.deep_learning.encoder.EncoderClassifier),\n ('FCNClassifier', aeon.classification.deep_learning.fcn.FCNClassifier),\n ('FreshPRINCEClassifier',\n aeon.classification.feature_based._fresh_prince.FreshPRINCEClassifier),\n ('HIVECOTEV2', aeon.classification.hybrid._hivecote_v2.HIVECOTEV2),\n ('InceptionTimeClassifier',\n aeon.classification.deep_learning.inception_time.InceptionTimeClassifier),\n ('IndividualInceptionClassifier',\n aeon.classification.deep_learning.inception_time.IndividualInceptionClassifier),\n ('IndividualOrdinalTDE',\n aeon.classification.ordinal_classification._ordinal_tde.IndividualOrdinalTDE),\n ('IndividualTDE', aeon.classification.dictionary_based._tde.IndividualTDE),\n ('IntervalForestClassifier',\n aeon.classification.interval_based._interval_forest.IntervalForestClassifier),\n ('KNeighborsTimeSeriesClassifier',\n aeon.classification.distance_based._time_series_neighbors.KNeighborsTimeSeriesClassifier),\n ('MLPClassifier', aeon.classification.deep_learning.mlp.MLPClassifier),\n ('MUSE', aeon.classification.dictionary_based._muse.MUSE),\n ('OrdinalTDE',\n aeon.classification.ordinal_classification._ordinal_tde.OrdinalTDE),\n ('RDSTClassifier', aeon.classification.shapelet_based._rdst.RDSTClassifier),\n ('RSTSF', aeon.classification.interval_based._rstsf.RSTSF),\n ('RandomIntervalClassifier',\n aeon.classification.interval_based._interval_pipelines.RandomIntervalClassifier),\n ('RandomIntervalSpectralEnsembleClassifier',\n aeon.classification.interval_based._rise.RandomIntervalSpectralEnsembleClassifier),\n ('ResNetClassifier',\n aeon.classification.deep_learning.resnet.ResNetClassifier),\n ('RocketClassifier',\n aeon.classification.convolution_based._rocket_classifier.RocketClassifier),\n ('ShapeletTransformClassifier',\n aeon.classification.shapelet_based._stc.ShapeletTransformClassifier),\n ('SignatureClassifier',\n aeon.classification.feature_based._signature_classifier.SignatureClassifier),\n ('SummaryClassifier',\n aeon.classification.feature_based._summary_classifier.SummaryClassifier),\n ('SupervisedIntervalClassifier',\n aeon.classification.interval_based._interval_pipelines.SupervisedIntervalClassifier),\n ('SupervisedTimeSeriesForest',\n aeon.classification.interval_based._stsf.SupervisedTimeSeriesForest),\n ('TSFreshClassifier',\n aeon.classification.feature_based._tsfresh_classifier.TSFreshClassifier),\n ('TapNetClassifier',\n aeon.classification.deep_learning.tapnet.TapNetClassifier),\n ('TemporalDictionaryEnsemble',\n aeon.classification.dictionary_based._tde.TemporalDictionaryEnsemble),\n ('TimeSeriesForestClassifier',\n aeon.classification.interval_based._tsf.TimeSeriesForestClassifier)]" - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from aeon.utils.discovery import all_estimators\n", - "\n", - "all_estimators(\n", - " tag_filter={\"capability:multivariate\": True},\n", - " type_filter=\"classifier\",\n", - ")" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "An alternative for MTSC is to build a univariate classifier on each channel, then\n", - "ensemble. Channel ensembling can be easily done via ``ClassifierChannelEnsemble``\n", - "which fits classifiers independently to specified channels, then\n", - "combines predictions through a voting scheme. The example below builds a DrCIF\n", - "classifier on the first channel and a RocketClassifier on the fourth and fifth\n", - "dimensions, ignoring the second, third and sixth." - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "text/plain": "0.925" - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from aeon.classification.compose import ClassifierChannelEnsemble\n", - "from aeon.classification.interval_based import DrCIFClassifier\n", - "\n", - "cls = ClassifierChannelEnsemble(\n", - " classifiers=[\n", - " (\"DrCIF0\", DrCIFClassifier(n_estimators=5, n_intervals=2)),\n", - " (\"ROCKET3\", RocketClassifier(n_kernels=1000)),\n", - " ],\n", - " channels=[[0], [3, 4]],\n", - ")\n", - "\n", - "cls.fit(motions, motions_labels)\n", - "y_pred = cls.predict(motions_test)\n", - "\n", - "accuracy_score(motions_test_labels, y_pred)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "## sklearn Compatibility\n", - "\n", - "`aeon` classifiers are compatible with `sklearn` model selection and\n", - "composition tools using `aeon` data formats. For example, cross-validation can\n", - "be performed using the `sklearn` `cross_val_score` and `KFold` functionality:" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": "array([0.88888889, 0.66666667, 0.88888889, 0.77777778])" - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from sklearn.model_selection import KFold, cross_val_score\n", - "\n", - "cross_val_score(rocket, arrow, y=arrow_labels, cv=KFold(n_splits=4))" - ] - }, - { - "cell_type": "markdown", - "source": [ - "Parameter tuning can be done using `sklearn` `GridSearchCV`. For example, we can tune\n", - " the _k_ and distance measure for a K-NN classifier:" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": "0.8" - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from sklearn.model_selection import GridSearchCV\n", - "\n", - "from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier\n", - "\n", - "knn = KNeighborsTimeSeriesClassifier()\n", - "param_grid = {\"n_neighbors\": [1, 5], \"distance\": [\"euclidean\", \"dtw\"]}\n", - "parameter_tuning_method = GridSearchCV(knn, param_grid, cv=KFold(n_splits=4))\n", - "\n", - "parameter_tuning_method.fit(arrow, arrow_labels)\n", - "y_pred = parameter_tuning_method.predict(arrow_test)\n", - "\n", - "accuracy_score(arrow_test_labels, y_pred)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "Probability calibration is possible with the `sklearn` `CalibratedClassifierCV`:" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": { - "collapsed": false - }, - "outputs": [ - { - "data": { - "text/plain": "0.7714285714285715" - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Time Series Classification\n", + "\n", + "Time Series Classification (TSC) involves training a model from a collection\n", + " of time series (real valued, ordered, data) in order to predict a discrete target\n", + " variable. For example, we might want to build a model that can predict whether a patient\n", + " is sick based on their ECG reading, or a persons type of movement based on the trace\n", + " of the position of their hand. This notebook gives a quick guide to TSC to get you\n", + " started using aeon time series classifiers. If you can use scikit-learn, it should\n", + " be easy, because the basic usage is identical.\n", + "\n", + "\"time" + ], + "metadata": { + "collapsed": false, + "id": "_pBlXBeTh5IG" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Classification Notebooks\n", + "\n", + "This note book gives an overview of TSC. More specific notebooks on TSC are base on\n", + "the type of representation or transformation they use:\n", + "\n", + "- [Convolution based](convolution_based.ipynb)\n", + "- [Deep learning](deep_learning.ipynb)\n", + "- [Dictionary based](dictionary_based.ipynb)\n", + "- [Distance based](distance_based.ipynb)\n", + "- [Feature based](feature_based.ipynb)\n", + "- [Interval based](interval_based.ipynb)\n", + "- [Shapelet based](shapelet_based.ipynb)\n", + "- [Hybrid](hybrid.ipynb)\n", + "- [Early classification](early_classification.ipynb)\n" + ], + "metadata": { + "collapsed": false, + "id": "weha73tPh5IH" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Data Storage and Problem Types\n", + "\n", + "Time series can be univariate (each observation is a single value) or multivariate\n", + "(each observation is a vector). For example, an ECG reading from a single\n", + "sensor is a univariate series, but a motion trace of from a smart watch would be\n", + "multivariate, with at least three dimensions (x,y,z co-ordinates). The image above is\n", + " a univariate problem: each series has its own label. The dimension of the time\n", + " series instance is also often called the channel. We recommend storing time series\n", + " in 3D numpy array of shape `(n_cases, n_channels, n_timepoints)` and\n", + " where possible our single problem loaders will return a\n", + " 3D numpy. Unequal length classification problems are stored in a list of 2D numpy\n", + " arrays. More details on data storage can be found in the [data storage](../datasets/datasets.ipynb) notebook." + ], + "metadata": { + "collapsed": false, + "id": "EyjESzTQh5II" + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "ArrowHead series of type and shape (36, 1, 251)\n", + "Motions type of shape (40,)\n" + ] + } + ], + "source": [ + "# Plotting and data loading imports used in this notebook\n", + "import matplotlib.pyplot as plt\n", + "\n", + "from aeon.datasets import load_arrow_head, load_basic_motions\n", + "\n", + "arrow, arrow_labels = load_arrow_head(split=\"train\")\n", + "motions, motions_labels = load_basic_motions(split=\"train\")\n", + "print(f\"ArrowHead series of type {type(arrow)} and shape {arrow.shape}\")\n", + "print(f\"Motions type {type(motions)} of shape {motions_labels.shape}\")" + ], + "metadata": { + "id": "bjW-qRxOh5II", + "outputId": "a17f6f06-04b2-4fed-877e-92ef9680cdef", + "colab": { + "base_uri": "https://localhost:8080/" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "We use 3D numpy even if the data is univariate: even though classifiers\n", + "can work using a 2D array of shape `(n_cases, n_timepoints)`, this 2D shape can get\n", + "confused with single multivariate time series, which are of shape `(n_channels, n_timepoints)`.\n", + "Hence, to differentiate both cases, we enforce the 3D format `(n_cases, n_channels,\n", + "n_timepoints)` to avoid any confusion.\n", + "\n", + "If your series are unequal length, have missing values or are\n", + " sampled at irregular time intervals, you should read the note book\n", + " on [data preprocessing](../transformations/preprocessing.ipynb).\n", + "\n", + "The [TSC dataset archive](https://timeseriesclassification.com/) contains a\n", + "large number of example TSC problems that have been used thousands of times in the\n", + "literature to assess TSC algorithms. These datasets have certain characteristics that\n", + "influence what data structure we use to store them in memory.\n", + "\n", + "Most datasets in the archive contain time series all the same length. For example,\n", + "the [ArrowHead dataset](https://timeseriesclassification.com/description.php?Dataset=ArrowHead) we have just loaded consists of outlines of the images of\n", + "arrow heads. The classification of projectile points is an important topic in anthropology.\n", + "\n", + "\"arrow\n", + "\n", + "The shapes of the projectile points are converted into a sequence using the\n", + "angle-based method as described in this [blog post](https://izbicki.me/blog/converting-images-into-time-series-for-data-mining.html) about converting images into time series for data mining.\n", + "\n", + "\"from\n", + "\n", + "Each instance consists of a single time series (i.e. the problem is univariate) of\n", + "equal length and a class label based on shape distinctions such as the presence and\n", + "location of a notch in the arrow. The data set consists of 210 instances, by default split into 36 train and 175 test instances.\n", + "\n", + "The [BasicMotions dataset](https://timeseriesclassification.com/description.php?Dataset=BasicMotions) is an example of a multivariate TSC problem. It was generated\n", + " as part of a project where four students performed four activities whilst wearing a\n", + " smartwatch. The watch collects 3D accelerometer and 3D gyroscope data. Each instance\n", + " involved a subject performing one of four tasks (walking, resting, running and\n", + " badminton) for ten seconds. Time series in this data set have six dimensions or\n", + " channels." + ], + "metadata": { + "collapsed": false, + "id": "pPrsdjsOh5IJ" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "9T5zoVT9h5IJ", + "outputId": "2aa3e84a-9fdd-4cd7-fcff-4f6f8172c5ce", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 469 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[]" + ] + }, + "metadata": {}, + "execution_count": 7 + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.title(\n", + " f\"First and second dimensions of the first instance in BasicMotions data, \"\n", + " f\"(student {motions_labels[0]})\"\n", + ")\n", + "plt.plot(motions[0][0])\n", + "plt.plot(motions[0][1])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "TtIuima2h5IK", + "outputId": "17310dc6-8ba5-45bb-8e2b-a07f80402bef", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 469 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[]" + ] + }, + "metadata": {}, + "execution_count": 8 + }, + { + "output_type": "display_data", + "data": { + "text/plain": [ + "
" + ], + "image/png": "\n" + }, + "metadata": {} + } + ], + "source": [ + "plt.title(f\"First instance in ArrowHead data (class {arrow_labels[0]})\")\n", + "plt.plot(arrow[0, 0])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dEZoDQSTh5IK" + }, + "source": [ + "It is possible to use a standard `sklearn` classifier for univariate, equal length\n", + "classification problems, but it is unlikely to perform as well as bespoke time series\n", + " classifiers, since `sklearn` classifiers ignore the sequence information in the variables.\n", + "\n", + "To apply `sklearn` classifiers directly, the data needs to be reshaped into a 2D\n", + "numpy array. We also offer the ability to load univariate TSC problems directly in 2D\n", + " arrays although we recommend using 3D numpy of shape `(n_channels, 1, n_timepoints)\n", + " ` for univariate collections." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.72" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.metrics import accuracy_score\n", + "\n", + "rand_forest = RandomForestClassifier(n_estimators=100)\n", + "arrow2d = arrow.squeeze()\n", + "arrow_test, arrow_test_labels = load_arrow_head(split=\"test\", return_type=\"numpy2d\")\n", + "rand_forest.fit(arrow2d, arrow_labels)\n", + "y_pred = rand_forest.predict(arrow_test)\n", + "accuracy_score(arrow_test_labels, y_pred)" + ], + "metadata": { + "id": "qG96PKaCh5IK", + "outputId": "07ae1abe-a9d2-4e19-f515-9ca20e017177", + "colab": { + "base_uri": "https://localhost:8080/" + } + } + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false, + "id": "r2T8AclIh5IK" + }, + "source": [ + "## Time Series Classifiers in aeon\n", + "\n", + "`aeon` contains the state of the art in time series classifiers in the package\n", + "`classification`. These are grouped based on the data representation used to find\n", + "discriminatory features. We provide a separate notebook for each of type:\n", + "[convolution based](convolution_based.ipynb), [deep learning](deep_learning.ipynb), [distance based](distance_based.ipynb), [dictionary based](dictionary_based.ipynb),\n", + "[feature_based](feature_based.ipynb), [hybrid](hybrid.ipynb), [interval based](interval_based.ipynb), and [shapelet based](shapelet_based.ipynb). We also\n", + "provide some\n", + "standard classifiers not available in scikit learn in the sklearn package.\n", + "We show the simplest use cases for classifiers and demonstrate how to build bespoke\n", + "pipelines for time series classification. An accurate and relatively\n", + "fast classifier is the [ROCKET](https://link.springer.com/article/10.1007/s10618-020-00701-z) classifier. ROCKET is a convolution based algorithm\n", + "described in detail in the [convolution based](convolution_based.ipynb) note book." + ] + }, + { + "cell_type": "code", + "metadata": { + "ExecuteTime": { + "end_time": "2024-11-16T19:16:46.486243Z", + "start_time": "2024-11-16T19:15:42.973051Z" + }, + "id": "2xIrRErYh5IL", + "outputId": "372654b5-3fae-42e8-a315-da9e33ad8e38", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "source": [ + "from aeon.classification.convolution_based import RocketClassifier\n", + "\n", + "rocket = RocketClassifier(n_kernels=2000)\n", + "rocket.fit(arrow, arrow_labels)\n", + "y_pred = rocket.predict(arrow_test)\n", + "\n", + "accuracy_score(arrow_test_labels, y_pred)" + ], + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.76" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ], + "execution_count": null + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false, + "id": "t9Ngfjfzh5IL" + }, + "source": [ + "A slower but generally more accurate classifier for time series classification is\n", + "version 2 of the [HIVE-COTE](https://link.springer.com/article/10.1007/s10994-021-06057-9) algorithm.\n", + "(HC2) is described in the [hybrid notebook](hybrid.ipynb) notebook. HC2 is particularly\n", + "slow\n", + "on small problems like these examples. However, it can be\n", + "configured with an approximate maximum run time as follows (it may take a bit longer\n", + "than 12 seconds to run this cell, very short times are approximate since there is a\n", + "minimum amount of work the classifier needs to do):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "u0rqqET8h5IL", + "outputId": "b1347f40-c82b-4ecf-ec72-500b1f7f8a12", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.8685714285714285" + ] + }, + "metadata": {}, + "execution_count": 11 + } + ], + "source": [ + "from aeon.classification.hybrid import HIVECOTEV2\n", + "\n", + "hc2 = HIVECOTEV2(time_limit_in_minutes=0.2)\n", + "hc2.fit(arrow, arrow_labels)\n", + "y_pred = hc2.predict(arrow_test)\n", + "\n", + "accuracy_score(arrow_test_labels, y_pred)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "The LITETime Classifier is an efficient deep learning-based model for time series classification. It is designed to handle both univariate and multivariate time series data effectively, offering lightweight architecture and competitive performance. For simplicity, this notebook uses 10 epochs to demonstrate the classifier's functionality. To observe the full performance of deep learning models in aeon, it’s recommended to use the library's default epochs. The reduced epochs here simplify the demonstration and reduce runtime. Deep learning approaches for time series classification, are further described in the [deep learning notebook](./deep_learning.ipynb).\n" + ], + "metadata": { + "id": "gTQRU2rkuPvw" + } + }, + { + "cell_type": "code", + "source": [ + "from aeon.classification.deep_learning import LITETimeClassifier\n", + "\n", + "lite_time = LITETimeClassifier(n_epochs=10, batch_size=32, random_state=42)\n", + "lite_time.fit(arrow, arrow_labels)\n", + "y_pred = lite_time.predict(arrow_test)\n", + "\n", + "accuracy_score(arrow_test_labels, y_pred)" + ], + "metadata": { + "id": "-nnwMXqtSzzc", + "outputId": "5ca88c72-3d6d-4d0b-90e7-b76da94aa62f", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[1m6/6\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 122ms/step\n", + "\u001b[1m6/6\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 126ms/step\n", + "\u001b[1m6/6\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 132ms/step\n", + "\u001b[1m6/6\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 122ms/step\n", + "\u001b[1m6/6\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 119ms/step\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.3942857142857143" + ] + }, + "metadata": {}, + "execution_count": 13 + } + ] + }, + { + "cell_type": "markdown", + "source": [], + "metadata": { + "collapsed": false, + "id": "3y4vwmA1h5IL" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Multivariate Classification\n", + "To use ``sklearn`` classifiers directly on multivariate data, one option is to flatten\n", + "the data so that the 3D array `(n_cases, n_channels, n_timepoints)` becomes a 2D array\n", + "of shape `(n_cases, n_channels*n_timepoints)`." + ], + "metadata": { + "collapsed": false, + "id": "OaBVEJmnh5IM" + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.925" + ] + }, + "metadata": {}, + "execution_count": 14 + } + ], + "source": [ + "motions_test, motions_test_labels = load_basic_motions(split=\"test\")\n", + "motions2d = motions.reshape(motions.shape[0], motions.shape[1] * motions.shape[2])\n", + "motions2d_test = motions_test.reshape(\n", + " motions_test.shape[0], motions_test.shape[1] * motions_test.shape[2]\n", + ")\n", + "rand_forest.fit(motions2d, motions_labels)\n", + "y_pred = rand_forest.predict(motions2d_test)\n", + "accuracy_score(motions_test_labels, y_pred)" + ], + "metadata": { + "id": "1mfxhLaZh5IM", + "outputId": "c0a7278f-7feb-45dc-a337-e0da2bcbbf60", + "colab": { + "base_uri": "https://localhost:8080/" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "However, many ``aeon`` classifiers, including ROCKET and HC2, are configured to\n", + "work with multivariate input. This works exactly like univariate classification. For example:" + ], + "metadata": { + "collapsed": false, + "id": "Hc2DrT2Fh5IM" + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "1.0" + ] + }, + "metadata": {}, + "execution_count": 15 + } + ], + "source": [ + "rocket.fit(motions, motions_labels)\n", + "y_pred = rocket.predict(motions_test)\n", + "accuracy_score(motions_test_labels, y_pred)" + ], + "metadata": { + "id": "yXZW8cAch5IM", + "outputId": "f3b7b3b7-8204-4e30-cca8-1f07b4d53d90", + "colab": { + "base_uri": "https://localhost:8080/" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "A list of classifiers capable of handling multivariate classification can be obtained\n", + " with this code" + ], + "metadata": { + "collapsed": false, + "id": "vW1usODIh5IM" + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "[('Arsenal', aeon.classification.convolution_based._arsenal.Arsenal),\n", + " ('CanonicalIntervalForestClassifier',\n", + " aeon.classification.interval_based._cif.CanonicalIntervalForestClassifier),\n", + " ('Catch22Classifier',\n", + " aeon.classification.feature_based._catch22.Catch22Classifier),\n", + " ('ClassifierChannelEnsemble',\n", + " aeon.classification.compose._channel_ensemble.ClassifierChannelEnsemble),\n", + " ('DisjointCNNClassifier',\n", + " aeon.classification.deep_learning._disjoint_cnn.DisjointCNNClassifier),\n", + " ('DrCIFClassifier',\n", + " aeon.classification.interval_based._drcif.DrCIFClassifier),\n", + " ('DummyClassifier', aeon.classification.dummy.DummyClassifier),\n", + " ('ElasticEnsemble',\n", + " aeon.classification.distance_based._elastic_ensemble.ElasticEnsemble),\n", + " ('EncoderClassifier',\n", + " aeon.classification.deep_learning._encoder.EncoderClassifier),\n", + " ('FCNClassifier', aeon.classification.deep_learning._fcn.FCNClassifier),\n", + " ('FreshPRINCEClassifier',\n", + " aeon.classification.feature_based._fresh_prince.FreshPRINCEClassifier),\n", + " ('HIVECOTEV2', aeon.classification.hybrid._hivecote_v2.HIVECOTEV2),\n", + " ('HydraClassifier',\n", + " aeon.classification.convolution_based._hydra.HydraClassifier),\n", + " ('InceptionTimeClassifier',\n", + " aeon.classification.deep_learning._inception_time.InceptionTimeClassifier),\n", + " ('IndividualInceptionClassifier',\n", + " aeon.classification.deep_learning._inception_time.IndividualInceptionClassifier),\n", + " ('IndividualLITEClassifier',\n", + " aeon.classification.deep_learning._lite_time.IndividualLITEClassifier),\n", + " ('IndividualOrdinalTDE',\n", + " aeon.classification.ordinal_classification._ordinal_tde.IndividualOrdinalTDE),\n", + " ('IndividualTDE', aeon.classification.dictionary_based._tde.IndividualTDE),\n", + " ('IntervalForestClassifier',\n", + " aeon.classification.interval_based._interval_forest.IntervalForestClassifier),\n", + " ('KNeighborsTimeSeriesClassifier',\n", + " aeon.classification.distance_based._time_series_neighbors.KNeighborsTimeSeriesClassifier),\n", + " ('LITETimeClassifier',\n", + " aeon.classification.deep_learning._lite_time.LITETimeClassifier),\n", + " ('LearningShapeletClassifier',\n", + " aeon.classification.shapelet_based._ls.LearningShapeletClassifier),\n", + " ('MLPClassifier', aeon.classification.deep_learning._mlp.MLPClassifier),\n", + " ('MUSE', aeon.classification.dictionary_based._muse.MUSE),\n", + " ('MiniRocketClassifier',\n", + " aeon.classification.convolution_based._minirocket.MiniRocketClassifier),\n", + " ('MultiRocketClassifier',\n", + " aeon.classification.convolution_based._multirocket.MultiRocketClassifier),\n", + " ('MultiRocketHydraClassifier',\n", + " aeon.classification.convolution_based._mr_hydra.MultiRocketHydraClassifier),\n", + " ('OrdinalTDE',\n", + " aeon.classification.ordinal_classification._ordinal_tde.OrdinalTDE),\n", + " ('QUANTClassifier',\n", + " aeon.classification.interval_based._quant.QUANTClassifier),\n", + " ('RDSTClassifier', aeon.classification.shapelet_based._rdst.RDSTClassifier),\n", + " ('REDCOMETS', aeon.classification.dictionary_based._redcomets.REDCOMETS),\n", + " ('RISTClassifier', aeon.classification.hybrid._rist.RISTClassifier),\n", + " ('RSTSF', aeon.classification.interval_based._rstsf.RSTSF),\n", + " ('RandomIntervalClassifier',\n", + " aeon.classification.interval_based._interval_pipelines.RandomIntervalClassifier),\n", + " ('RandomIntervalSpectralEnsembleClassifier',\n", + " aeon.classification.interval_based._rise.RandomIntervalSpectralEnsembleClassifier),\n", + " ('ResNetClassifier',\n", + " aeon.classification.deep_learning._resnet.ResNetClassifier),\n", + " ('RocketClassifier',\n", + " aeon.classification.convolution_based._rocket.RocketClassifier),\n", + " ('ShapeletTransformClassifier',\n", + " aeon.classification.shapelet_based._stc.ShapeletTransformClassifier),\n", + " ('SignatureClassifier',\n", + " aeon.classification.feature_based._signature_classifier.SignatureClassifier),\n", + " ('SummaryClassifier',\n", + " aeon.classification.feature_based._summary.SummaryClassifier),\n", + " ('SupervisedIntervalClassifier',\n", + " aeon.classification.interval_based._interval_pipelines.SupervisedIntervalClassifier),\n", + " ('SupervisedTimeSeriesForest',\n", + " aeon.classification.interval_based._stsf.SupervisedTimeSeriesForest),\n", + " ('TSFreshClassifier',\n", + " aeon.classification.feature_based._tsfresh.TSFreshClassifier),\n", + " ('TemporalDictionaryEnsemble',\n", + " aeon.classification.dictionary_based._tde.TemporalDictionaryEnsemble),\n", + " ('TimeCNNClassifier',\n", + " aeon.classification.deep_learning._cnn.TimeCNNClassifier),\n", + " ('TimeSeriesForestClassifier',\n", + " aeon.classification.interval_based._tsf.TimeSeriesForestClassifier)]" + ] + }, + "metadata": {}, + "execution_count": 16 + } + ], + "source": [ + "from aeon.utils.discovery import all_estimators\n", + "\n", + "all_estimators(\n", + " tag_filter={\"capability:multivariate\": True},\n", + " type_filter=\"classifier\",\n", + ")" + ], + "metadata": { + "id": "-efZQXWCh5IN", + "outputId": "778d4b99-7f28-4722-bbc1-c0937d8bdfb9", + "colab": { + "base_uri": "https://localhost:8080/" + } + } + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false, + "id": "UBibXSh1h5IN" + }, + "source": [ + "An alternative for MTSC is to build a univariate classifier on each channel, then\n", + "ensemble. Channel ensembling can be easily done via ``ClassifierChannelEnsemble``\n", + "which fits classifiers independently to specified channels, then\n", + "combines predictions through a voting scheme. The example below builds a DrCIF\n", + "classifier on the first channel and a RocketClassifier on the fourth and fifth\n", + "dimensions, ignoring the second, third and sixth." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "xtlozU2Hh5IN", + "outputId": "9c5478f1-0184-4afa-87e3-1526988796fe", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.9" + ] + }, + "metadata": {}, + "execution_count": 17 + } + ], + "source": [ + "from aeon.classification.compose import ClassifierChannelEnsemble\n", + "from aeon.classification.interval_based import DrCIFClassifier\n", + "\n", + "cls = ClassifierChannelEnsemble(\n", + " classifiers=[\n", + " (\"DrCIF0\", DrCIFClassifier(n_estimators=5, n_intervals=2)),\n", + " (\"ROCKET3\", RocketClassifier(n_kernels=1000)),\n", + " ],\n", + " channels=[[0], [3, 4]],\n", + ")\n", + "\n", + "cls.fit(motions, motions_labels)\n", + "y_pred = cls.predict(motions_test)\n", + "\n", + "accuracy_score(motions_test_labels, y_pred)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## sklearn Compatibility\n", + "\n", + "`aeon` classifiers are compatible with `sklearn` model selection and\n", + "composition tools using `aeon` data formats. For example, cross-validation can\n", + "be performed using the `sklearn` `cross_val_score` and `KFold` functionality:" + ], + "metadata": { + "collapsed": false, + "id": "-7NDHcmzh5IN" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Pw_ZNfJvh5IN", + "outputId": "7963c9b6-673f-4d66-95df-e7aa418e39ce", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([0.88888889, 0.66666667, 0.77777778, 0.77777778])" + ] + }, + "metadata": {}, + "execution_count": 18 + } + ], + "source": [ + "from sklearn.model_selection import KFold, cross_val_score\n", + "\n", + "cross_val_score(rocket, arrow, y=arrow_labels, cv=KFold(n_splits=4))" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Parameter tuning can be done using `sklearn` `GridSearchCV`. For example, we can tune\n", + " the _k_ and distance measure for a K-NN classifier:" + ], + "metadata": { + "collapsed": false, + "id": "aJNXKkYHh5IO" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "K67ps0Bnh5IO", + "outputId": "460d0d39-ae25-4cfa-ea50-02dde57654da", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.8" + ] + }, + "metadata": {}, + "execution_count": 19 + } + ], + "source": [ + "from sklearn.model_selection import GridSearchCV\n", + "\n", + "from aeon.classification.distance_based import KNeighborsTimeSeriesClassifier\n", + "\n", + "knn = KNeighborsTimeSeriesClassifier()\n", + "param_grid = {\"n_neighbors\": [1, 5], \"distance\": [\"euclidean\", \"dtw\"]}\n", + "parameter_tuning_method = GridSearchCV(knn, param_grid, cv=KFold(n_splits=4))\n", + "\n", + "parameter_tuning_method.fit(arrow, arrow_labels)\n", + "y_pred = parameter_tuning_method.predict(arrow_test)\n", + "\n", + "accuracy_score(arrow_test_labels, y_pred)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Probability calibration is possible with the `sklearn` `CalibratedClassifierCV`:" + ], + "metadata": { + "collapsed": false, + "id": "FtiuhfARh5IO" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "oyywFEuhh5IO", + "outputId": "719c1f06-7eff-429b-dd26-e7da6be20972", + "colab": { + "base_uri": "https://localhost:8080/" + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0.7485714285714286" + ] + }, + "metadata": {}, + "execution_count": 20 + } + ], + "source": [ + "from sklearn.calibration import CalibratedClassifierCV\n", + "\n", + "from aeon.classification.interval_based import DrCIFClassifier\n", + "\n", + "calibrated_drcif = CalibratedClassifierCV(\n", + " estimator=DrCIFClassifier(n_estimators=10, n_intervals=5), cv=4\n", + ")\n", + "\n", + "calibrated_drcif.fit(arrow, arrow_labels)\n", + "y_pred = calibrated_drcif.predict(arrow_test)\n", + "\n", + "accuracy_score(arrow_test_labels, y_pred)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": false, + "id": "flt4wrMOh5IO" + }, + "source": [ + "### Background info and references for classifiers used here\n", + "\n", + "#### KNeighborsTimeSeriesClassifier\n", + "\n", + "One nearest neighbour (1-NN) classification with Dynamic Time Warping (DTW) is\n", + "a [distance based](distance_based.ipynb) classifier and one of the most frequently used\n", + "approaches, although it is less accurate on average than the state of the art.\n", + "\n", + "#### RocketClassifier\n", + "The RocketClassifier is a [convolution based](convolution_based.ipynb) classifier\n", + "made up of a pipeline combination of the ROCKET transformation\n", + " (transformations.panel.rocket) and the sklearn RidgeClassifierCV classifier. The RocketClassifier is configurable to use variants MiniRocket and MultiRocket. ROCKET is based on generating random convolutional kernels. A large number are generated, then a linear classifier is built on the output.\n", + "\n", + "[1] Dempster, Angus, François Petitjean, and Geoffrey I. Webb. \"Rocket: exceptionally fast and accurate time series classification using random convolutional kernels.\" Data Mining and Knowledge Discovery (2020)\n", + "[arXiv version](https://arxiv.org/abs/1910.13051)\n", + "[DAMI 2020](https://link.springer.com/article/10.1007/s10618-020-00701-z)\n", + "\n", + "#### DrCIF\n", + "The Diverse Representation Canonical Interval Forest Classifier (DrCIF) is an\n", + "[interval based](interval_based.ipynb) classifier. The algorithm takes multiple\n", + "randomised intervals from each series and extracts a range of features. These features are used to build a decision tree, which in turn are ensembled into a decision tree forest, in the style of a random forest.\n", + "\n", + "Original CIF classifier:\n", + "[2] Matthew Middlehurst and James Large and Anthony Bagnall. \"The Canonical Interval Forest (CIF) Classifier for Time Series Classification.\" IEEE International Conference on Big Data (2020)\n", + "[arXiv version](https://arxiv.org/abs/2008.09172)\n", + "[IEEE BigData (2020)](https://ieeexplore.ieee.org/abstract/document/9378424?casa_token=8g_IG5MLJZ4AAAAA:ItxW0bY4eCRwfdV9kLvf-8a8X73UFCYUGU9D19PwrHigjivLJVchxHwkM3Btn7vvlOJ_0HiLRa3LCA)\n", + "\n", + "The DrCIF adjustment was proposed in [3].\n", + "\n", + "#### HIVE-COTE 2.0 (HC2)\n", + "The HIerarchical VotE Collective of Transformation-based Ensembles is a meta ensemble\n", + " [hybrid](hybrid.ipynb) that combines classifiers built on different representations.\n", + " Version 2 combines DrCIF, TDE, an ensemble of RocketClassifiers called the Arsenal and the ShapeletTransformClassifier. It is one of the most accurate classifiers on the UCR and UEA time series archives.\n", + "\n", + "[3] Middlehurst, Matthew, James Large, Michael Flynn, Jason Lines, Aaron Bostrom, and Anthony Bagnall. \"HIVE-COTE 2.0: a new meta ensemble for time series classification.\" Machine Learning (2021)\n", + "[ML 2021](https://link.springer.com/article/10.1007/s10994-021-06057-9)\n", + "\n", + "#### LITETime Classifier\n", + "\n", + "The LITETimeClassifier, is a lightweight [deep learning model](https://github.com/aeon-toolkit/aeon/blob/main/examples/classification/deep_learning.ipynb). designed specifically for efficient and accurate time series classification (TSC). It leverages techniques like depthwise separable convolutions to minimize the number of parameters and computational overhead without compromising performance.\n", + "\n", + "[4] Ismail-Fawaz et al. LITE: Light Inception with boosTing tEchniques for Time Series Classification, IEEE International Conference on Data Science and Advanced Analytics, 2023 [LITE (pdf)](https://germain-forestier.info/publis/dsaa2023.pdf)\n", + "\n", + "[5] Ismail-Fawaz, Ali, et al. “Look Into the LITE in Deep Learning for Time Series Classification.” arXiv preprint arXiv:2409.02869 (2024).arXiv preprint arXiv:2409.02869 [arXiv preprint](https://arxiv.org/abs/2409.02869)\n" + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "ms0mSnWEU11v" + }, + "execution_count": null, + "outputs": [] } - ], - "source": [ - "from sklearn.calibration import CalibratedClassifierCV\n", - "\n", - "from aeon.classification.interval_based import DrCIFClassifier\n", - "\n", - "calibrated_drcif = CalibratedClassifierCV(\n", - " estimator=DrCIFClassifier(n_estimators=10, n_intervals=5), cv=4\n", - ")\n", - "\n", - "calibrated_drcif.fit(arrow, arrow_labels)\n", - "y_pred = calibrated_drcif.predict(arrow_test)\n", - "\n", - "accuracy_score(arrow_test_labels, y_pred)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": false - }, - "source": [ - "### Background info and references for classifiers used here\n", - "\n", - "#### KNeighborsTimeSeriesClassifier\n", - "\n", - "One nearest neighbour (1-NN) classification with Dynamic Time Warping (DTW) is\n", - "a [distance based](distance_based.ipynb) classifier and one of the most frequently used\n", - "approaches, although it is less accurate on average than the state of the art.\n", - "\n", - "#### RocketClassifier\n", - "The RocketClassifier is a [convolution based](convolution_based.ipynb) classifier\n", - "made up of a pipeline combination of the ROCKET transformation\n", - " (transformations.panel.rocket) and the sklearn RidgeClassifierCV classifier. The RocketClassifier is configurable to use variants MiniRocket and MultiRocket. ROCKET is based on generating random convolutional kernels. A large number are generated, then a linear classifier is built on the output.\n", - "\n", - "[1] Dempster, Angus, François Petitjean, and Geoffrey I. Webb. \"Rocket: exceptionally fast and accurate time series classification using random convolutional kernels.\" Data Mining and Knowledge Discovery (2020)\n", - "[arXiv version](https://arxiv.org/abs/1910.13051)\n", - "[DAMI 2020](https://link.springer.com/article/10.1007/s10618-020-00701-z)\n", - "\n", - "#### DrCIF\n", - "The Diverse Representation Canonical Interval Forest Classifier (DrCIF) is an\n", - "[interval based](interval_based.ipynb) classifier. The algorithm takes multiple\n", - "randomised intervals from each series and extracts a range of features. These features are used to build a decision tree, which in turn are ensembled into a decision tree forest, in the style of a random forest.\n", - "\n", - "Original CIF classifier:\n", - "[2] Matthew Middlehurst and James Large and Anthony Bagnall. \"The Canonical Interval Forest (CIF) Classifier for Time Series Classification.\" IEEE International Conference on Big Data (2020)\n", - "[arXiv version](https://arxiv.org/abs/2008.09172)\n", - "[IEEE BigData (2020)](https://ieeexplore.ieee.org/abstract/document/9378424?casa_token=8g_IG5MLJZ4AAAAA:ItxW0bY4eCRwfdV9kLvf-8a8X73UFCYUGU9D19PwrHigjivLJVchxHwkM3Btn7vvlOJ_0HiLRa3LCA)\n", - "\n", - "The DrCIF adjustment was proposed in [3].\n", - "\n", - "#### HIVE-COTE 2.0 (HC2)\n", - "The HIerarchical VotE Collective of Transformation-based Ensembles is a meta ensemble\n", - " [hybrid](hybrid.ipynb) that combines classifiers built on different representations.\n", - " Version 2 combines DrCIF, TDE, an ensemble of RocketClassifiers called the Arsenal and the ShapeletTransformClassifier. It is one of the most accurate classifiers on the UCR and UEA time series archives.\n", - "\n", - "[3] Middlehurst, Matthew, James Large, Michael Flynn, Jason Lines, Aaron Bostrom, and Anthony Bagnall. \"HIVE-COTE 2.0: a new meta ensemble for time series classification.\" Machine Learning (2021)\n", - "[ML 2021](https://link.springer.com/article/10.1007/s10994-021-06057-9)\n" - ] - } - ], - "metadata": { - "interpreter": { - "hash": "9d800c14abb2bd109b7479fe8830174a66f0a4a77373f77c2c7334932e1a4922" - }, - "kernelspec": { - "name": "python3", - "language": "python", - "display_name": "Python 3 (ipykernel)" + ], + "metadata": { + "interpreter": { + "hash": "9d800c14abb2bd109b7479fe8830174a66f0a4a77373f77c2c7334932e1a4922" + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.12" + }, + "colab": { + "provenance": [], + "gpuType": "T4" + }, + "accelerator": "GPU" }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.12" - } - }, - "nbformat": 4, - "nbformat_minor": 0 + "nbformat": 4, + "nbformat_minor": 0 }