Skip to content

Commit

Permalink
[DOC] Shapelet classifier notebook extensions & minor docstring corre…
Browse files Browse the repository at this point in the history
…ctions (#1930)

* Corrected typos while doing a code review

* improved for clarity

* Typo corrections

* Some more typo corrections

* Correcting same typos in different docstring

* minor docstring corrections

* minor docstring typo correction

* small typo

* modified code to highlight the univariate example

* Further discussed shapelet distance visualisation

* Attempted a more high level description of the transform mapping

* Minor correction relating to last commit

* changed ! to . to avoid confusion with factorial

* Tried to make alpha similarity explanation more explicit by referencing overlap

* found small typo in docstring while going through notebook

* we don't know if the generated subsequence is a shapelet because we havent evaluated its discriminative ability

* more minor markdown text corrections

* My previous addition about the shapelet transform table was wrong - corrected.

* Describing the shapelet graphs

* Automatic `pre-commit` fixes

* fixing precommit error

---------

Co-authored-by: IRKnyazev <[email protected]>
  • Loading branch information
IRKnyazev and IRKnyazev authored Aug 12, 2024
1 parent d51ff5c commit 84aa1b7
Show file tree
Hide file tree
Showing 4 changed files with 123 additions and 107 deletions.
38 changes: 19 additions & 19 deletions aeon/classification/shapelet_based/_rdst.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Random Dilated Shapelet Transform (RDST) Classifier.
A Random Dilated Shapelet Transform classifier pipeline that simply performs a random
shapelet dilated transform and build (by default) a ridge classifier on the output.
shapelet dilated transform and builds (by default) a ridge classifier on the output.
"""

__maintainer__ = ["baraline"]
Expand All @@ -28,20 +28,20 @@ class RDSTClassifier(BaseClassifier):
Implementation of the random dilated shapelet transform classifier pipeline
along the lines of [1]_, [2]_. Transforms the data using the
`RandomDilatedShapeletTransform` and then builds a `RidgeClassifierCV` classifier
with standard scalling.
with standard scaling.
Parameters
----------
max_shapelets : int, default=10000
The maximum number of shapelet to keep for the final transformation.
A lower number of shapelets can be kept if alpha similarity have discarded the
The maximum number of shapelets to keep for the final transformation.
A lower number of shapelets can be kept if alpha similarity has discarded the
whole dataset.
shapelet_lengths : array, default=None
The set of possible length for shapelets. Each shapelet length is uniformly
drawn from this set. If None, the shapelets length will be equal to
The set of possible lengths for shapelets. Each shapelet length is uniformly
drawn from this set. If None, the shapelet length will be equal to
min(max(2,n_timepoints//2),11).
proba_normalization : float, default=0.8
This probability (between 0 and 1) indicate the chance of each shapelet to be
This probability (between 0 and 1) indicates the chance of each shapelet to be
initialized such as it will use a z-normalized distance, inducing either scale
sensitivity or invariance. A value of 1 would mean that all shapelets will use
a z-normalized distance.
Expand All @@ -50,22 +50,22 @@ class RDSTClassifier(BaseClassifier):
Occurrence feature. If None, the 5th and the 10th percentiles (i.e. [5,10])
will be used.
alpha_similarity : float, default=0.5
The strenght of the alpha similarity pruning. The higher the value, the lower
the allowed number of common indexes with previously sampled shapelets
when sampling a new candidate with the same dilation parameter.
It can cause the number of sampled shapelets to be lower than max_shapelets if
the whole search space has been covered. The default is 0.5, and the maximum is
1. Value above it have no effect for now.
The strength of the alpha similarity pruning. The higher the value, the fewer
common indexes with previously sampled shapelets are allowed when sampling a
new candidate with the same dilation parameter. It can cause the number of
sampled shapelets to be lower than max_shapelets if the whole search space has
been covered. The default is 0.5, and the maximum is 1. Values above it have
no effect for now.
use_prime_dilations : bool, default=False
If True, restrict the value of the shapelet dilation parameter to be prime
If True, restricts the value of the shapelet dilation parameter to be prime
values. This can greatly speed-up the algorithm for long time series and/or
short shapelet length, possibly at the cost of some accuracy.
short shapelet lengths, possibly at the cost of some accuracy.
distance: str="manhattan"
Name of the distance function to be used. By default this is the
manhattan distance. Other distances from the aeon distance modules can be used.
estimator : BaseEstimator or None, default=None
Base estimator for the ensemble, can be supplied a sklearn `BaseEstimator`. If
`None` a default `RidgeClassifierCV` classifier is used with standard scalling.
`None` a default `RidgeClassifierCV` classifier is used with standard scaling.
save_transformed_data : bool, default=False
If True, the transformed training dataset for all classifiers will be saved.
class_weight{“balanced”, “balanced_subsample”}, dict or list of dicts, default=None
Expand Down Expand Up @@ -234,7 +234,7 @@ def _predict(self, X) -> np.ndarray:
Parameters
----------
X: np.ndarray shape (n_cases, n_channels, n_timepoints)
The data to make prediction for.
The data to make predictions for.
Returns
-------
Expand All @@ -246,12 +246,12 @@ def _predict(self, X) -> np.ndarray:
return self._estimator.predict(X_t)

def _predict_proba(self, X) -> np.ndarray:
"""Predicts labels probabilities for sequences in X.
"""Predicts label probabilities for sequences in X.
Parameters
----------
X: np.ndarray shape (n_cases, n_channels, n_timepoints)
The data to make predict probabilities for.
The data to predict probabilities for.
Returns
-------
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Dilated Shapelet transformers.
A modification of the classic Shapelet Transform which add a dilation parameter to
A modification of the classic Shapelet Transform which adds a dilation parameter to
Shapelets.
"""

Expand Down Expand Up @@ -38,33 +38,33 @@ class RandomDilatedShapeletTransform(BaseCollectionTransformer):
- Length is randomly selected from shapelet_lengths parameter
- Dilation is sampled as a function the shapelet length and time series length
- Normalization is chosen randomly given the probability given as parameter
- Value is sampled randomly from an input time series given the length and
- Start value is sampled randomly from an input time series given the length and
dilation parameter.
- Threshold is randomly chosen between two percentiles of the distribution
of the distance vector between the shapelet and another time series. This time
serie is drawn from the same class if classes are given during fit. Otherwise,
series is drawn from the same class if classes are given during fit. Otherwise,
a random sample will be used. If there is only one sample per class, the same
sample will be used.
Then, once the set of shapelets have been initialized, we extract the shapelet
features from each pair of shapelets and input series. Three features are extracted:
- min d(S,X): the minimum value of the distance vector between a shapelet S and
a time series X.
- argmin d(S,X): the location of the minumum.
- SO(d(S,X), threshold): The number of point in the distance vector that are
- SO(d(S,X), threshold): The number of points in the distance vector that are
bellow the threshold parameter of the shapelet.
Parameters
----------
max_shapelets : int, default=10000
The maximum number of shapelet to keep for the final transformation.
A lower number of shapelets can be kept if alpha similarity have discarded the
The maximum number of shapelets to keep for the final transformation.
A lower number of shapelets can be kept if alpha similarity has discarded the
whole dataset.
shapelet_lengths : array, default=None
The set of possible length for shapelets. Each shapelet length is uniformly
drawn from this set. If None, the shapelets length will be equal to
The set of possible lengths for shapelets. Each shapelet length is uniformly
drawn from this set. If None, the shapelet length will be equal to
min(max(2,n_timepoints//2),11).
proba_normalization : float, default=0.8
This probability (between 0 and 1) indicate the chance of each shapelet to be
This probability (between 0 and 1) indicates the chance of each shapelet to be
initialized such as it will use a z-normalized distance, inducing either scale
sensitivity or invariance. A value of 1 would mean that all shapelets will use
a z-normalized distance.
Expand All @@ -73,16 +73,16 @@ class RandomDilatedShapeletTransform(BaseCollectionTransformer):
Occurrence feature. If None, the 5th and the 10th percentiles (i.e. [5,10])
will be used.
alpha_similarity : float, default=0.5
The strength of the alpha similarity pruning. The higher the value, the lower
the allowed number of common indexes with previously sampled shapelets
when sampling a new candidate with the same dilation parameter.
It can cause the number of sampled shapelets to be lower than max_shapelets if
the whole search space has been covered. The default is 0.5, and the maximum is
1. Value above it have no effect for now.
The strength of the alpha similarity pruning. The higher the value, the fewer
common indexes with previously sampled shapelets are allowed when sampling a
new candidate with the same dilation parameter. It can cause the number of
sampled shapelets to be lower than max_shapelets if the whole search space
has been covered. The default is 0.5, and the maximum is 1. Values above it
have no effect for now.
use_prime_dilations : bool, default=False
If True, restrict the value of the shapelet dilation parameter to be prime
If True, restricts the value of the shapelet dilation parameter to be prime
values. This can greatly speed up the algorithm for long time series and/or
short shapelet length, possibly at the cost of some accuracy.
short shapelet lengths, possibly at the cost of some accuracy.
distance: str="manhattan"
Name of the distance function to be used. By default this is the
manhattan distance. Other distances from the aeon distance modules can be used.
Expand All @@ -98,7 +98,7 @@ class RandomDilatedShapeletTransform(BaseCollectionTransformer):
- shapelet values
- length parameter
- dilation parameter
- treshold parameter
- threshold parameter
- normalization parameter
- mean parameter
- standard deviation parameter
Expand All @@ -109,8 +109,8 @@ class RandomDilatedShapeletTransform(BaseCollectionTransformer):
Notes
-----
This implementation use all the features for multivariate shapelets, without
affecting a random feature subsets to each shapelet as done in the original
This implementation uses all the features for multivariate shapelets, without
affecting a random feature subset to each shapelet as done in the original
implementation. See `convst
https://github.com/baraline/convst/blob/main/convst/transformers/rdst.py`_.
Expand Down Expand Up @@ -717,7 +717,7 @@ def dilated_shapelet_transform(X, shapelets, distance):
-------
X_new : array, shape=(n_cases, 3*n_shapelets)
The transformed input time series with each shapelet extracting 3
feature from the distance vector computed on each time series.
features from the distance vector computed on each time series.
"""
(
Expand Down
2 changes: 1 addition & 1 deletion aeon/visualisation/estimator/_shapelets.py
Original file line number Diff line number Diff line change
Expand Up @@ -798,7 +798,7 @@ def visualize_best_shapelets_one_class(
"""
Plot the n_shp best candidates for the class_id.
Visualize best macth on two random samples and how the shapelet discriminate
Visualize best match on two random samples and how the shapelet discriminate
(X,y) with boxplots.
Parameters
Expand Down
148 changes: 82 additions & 66 deletions examples/classification/shapelet_based.ipynb

Large diffs are not rendered by default.

0 comments on commit 84aa1b7

Please sign in to comment.