Skip to content

Commit

Permalink
Update Scaling preprocessors (#69)
Browse files Browse the repository at this point in the history
* Renamed ZNormalizer to StandardScaler

* Implement RobustScaler.py

* Included more tests
  • Loading branch information
LouisCarpentier42 authored Jan 7, 2025
1 parent c8c81eb commit 5d438ab
Show file tree
Hide file tree
Showing 20 changed files with 730 additions and 471 deletions.
2 changes: 2 additions & 0 deletions docs/additional_information/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Added
- Implemented ``ClusterBasedLocalOutlierFactor`` (CBLOF) anomaly detector.
- Implemented ``KMeansAnomalyDetector`` anomaly detector.
- Implemented ``CopulaBasedOutlierDetector`` (COPOD) anomaly detector.
- Implemented ``RobustScaler`` preprocessor.

Changed
^^^^^^^
Expand All @@ -25,6 +26,7 @@ Changed

Fixed
^^^^^
- Renamed ``ZNormalizer`` to ``StandardScaler``, to make it align with the Sklearn declaration.


[0.2.3] - 2024-12-02
Expand Down
3 changes: 2 additions & 1 deletion docs/api/preprocessing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ Preprocessing module
.. autoclass:: dtaianomaly.preprocessing.ChainedPreprocessor
.. autoclass:: dtaianomaly.preprocessing.Identity
.. autoclass:: dtaianomaly.preprocessing.MinMaxScaler
.. autoclass:: dtaianomaly.preprocessing.ZNormalizer
.. autoclass:: dtaianomaly.preprocessing.StandardScaler
.. autoclass:: dtaianomaly.preprocessing.RobustScaler
.. autoclass:: dtaianomaly.preprocessing.MovingAverage
.. autoclass:: dtaianomaly.preprocessing.ExponentialMovingAverage
.. autoclass:: dtaianomaly.preprocessing.SamplingRateUnderSampler
Expand Down
8 changes: 4 additions & 4 deletions docs/getting_started/examples/quantitative_evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ is applied.
preprocessors = [
Identity(),
ZNormalizer(),
ChainedPreprocessor([MovingAverage(10), ZNormalizer()]),
ChainedPreprocessor([ExponentialMovingAverage(0.8), ZNormalizer()])
StandardScaler(),
ChainedPreprocessor([MovingAverage(10), StandardScaler()]),
ChainedPreprocessor([ExponentialMovingAverage(0.8), StandardScaler()])
]
We will now initialize our anomaly detectors. Each anomaly detector will be combined with each
Expand Down Expand Up @@ -124,7 +124,7 @@ as follows:
{ 'type': <name-of-component>, 'optional-param': <value-optional-parameter>}
The ``'type'`` equals the name of the component, for example ``'LocalOutlierFactor'``
or ``'ZNormalizer'``. This string must exactly match the object name of the component
or ``'StandardScaler'``. This string must exactly match the object name of the component
you want to add to the workflow. In addition, it is possible to define hyperparameters
of each component. For example for ``'LocalOutlierFactor'``, you must define a
``'window_size'``, but can optionally also define a ``'stride'``. An error will be
Expand Down
97 changes: 97 additions & 0 deletions dtaianomaly/preprocessing/RobustScaler.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@

import numpy as np
from typing import Optional, Tuple
from sklearn.exceptions import NotFittedError

from dtaianomaly.utils import get_dimension
from dtaianomaly.preprocessing.Preprocessor import Preprocessor


class RobustScaler(Preprocessor):
"""
Scale the time series using robust statistics.
The :py:class:`~dtaianomaly.preprocessing.RobustScaler` is similar to
:py:class:`~dtaianomaly.preprocessing.StandardScaler`, but uses robust
statistics rather than mean and standard deviation. The center of the data
is computed via the median, and the scale is computed as the range between
two quantiles (by default uses the IQR). This ensures that scaling is less
affected by outliers.
For a time series :math:`x`, center :math:`c` and scale :math:`s`, observation
:math:`x_i` is scaled to observation :math:`y_i` using the following equation:
.. math::
y_i = \\frac{x_i - c}{s}
Notice the similarity with the formula for standard scaling. For multivariate
time series, each attribute is scaled independently, each with an independent
scale and center.
Parameters
----------
quantile_range: tuple of (float, float), default = (25.0, 75.0)
Quantile range used to compute the ``scale_`` of the robust scaler.
By default, this is equal to the Inter Quantile Range (IQR). The first
value of the quantile range corresponds to the smallest quantile, the
second value corresponds to the larger quantile. If the first value is
not smaller than the second value, an error will be thrown. The values
must also both be in the range [0, 100].
Attributes
----------
center_: array-like of shape (n_attributes)
The median value in each attribute of the training data.
scale_: array-like of shape (n_attributes)
The quantile range for each attribute of the training data.
Raises
------
NotFittedError
If the `transform` method is called before fitting this StandardScaler.
"""
quantile_range: (float, float)
center_: np.array
scale_: np.array

def __init__(self, quantile_range: (float, float) = (25.0, 75.0)):
if not isinstance(quantile_range, tuple):
raise TypeError("`quantile_range` should be tuple")
if len(quantile_range) != 2:
raise ValueError("'quantile_range' should consist of exactly two values (length of 2)")
if not isinstance(quantile_range[0], (float, int)) or isinstance(quantile_range[0], bool):
raise TypeError("The first element `quantile_range` should be a float or int")
if not isinstance(quantile_range[1], (float, int)) or isinstance(quantile_range[1], bool):
raise TypeError("The second element `quantile_range` should be a float or int")
if quantile_range[0] < 0.0:
raise ValueError("the first element in 'quantile_range' must be at least 0.0")
if quantile_range[1] > 100.0:
raise ValueError("the second element in 'quantile_range' must be at most 100.0")
if not quantile_range[0] < quantile_range[1]:
raise ValueError("the first element in 'quantile_range' must be at smaller than the second element in 'quantile_range'")
self.quantile_range = quantile_range

def _fit(self, X: np.ndarray, y: Optional[np.ndarray] = None) -> 'RobustScaler':
if get_dimension(X) == 1:
# univariate case
self.center_ = np.array([np.nanmedian(X)])
q_min = np.percentile(X, q=self.quantile_range[0])
q_max = np.percentile(X, q=self.quantile_range[1])
self.scale_ = np.array([q_max - q_min])
else:
# multivariate case
self.center_ = np.nanmedian(X, axis=0)
q_min = np.percentile(X, q=self.quantile_range[0], axis=0)
q_max = np.percentile(X, q=self.quantile_range[1], axis=0)
self.scale_ = q_max - q_min
return self

def _transform(self, X: np.ndarray, y: Optional[np.ndarray] = None) -> Tuple[np.ndarray, Optional[np.ndarray]]:
if not (hasattr(self, 'center_') and hasattr(self, 'scale_')):
raise NotFittedError(f'Call `fit` before using transform on {str(self)}')
if not ((len(X.shape) == 1 and self.center_.shape[0] == 1) or X.shape[1] == self.center_.shape[0]):
raise AttributeError(f'Trying to robust scale a time series with {X.shape[0]} attributes while it was fitted on {self.center_.shape[0]} attributes!')

X_ = (X - self.center_) / self.scale_
return np.where(np.isnan(X_), X, X_), y
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
from dtaianomaly.preprocessing.Preprocessor import Preprocessor


class ZNormalizer(Preprocessor):
class StandardScaler(Preprocessor):
"""
Rescale to zero mean, unit variance.
Standard scale the data: rescale to zero mean, unit variance.
Rescale to zero mean and unit variance. A mean value and standard
deviation is computed on a training set, after which these values
Expand Down Expand Up @@ -37,7 +37,7 @@ class ZNormalizer(Preprocessor):
Raises
------
NotFittedError
If the `transform` method is called before fitting this MinMaxScaler.
If the `transform` method is called before fitting this StandardScaler.
"""
min_std: float
mean_: np.array
Expand All @@ -46,7 +46,7 @@ class ZNormalizer(Preprocessor):
def __init__(self, min_std: float = 1e-9):
self.min_std = min_std

def _fit(self, X: np.ndarray, y: Optional[np.ndarray] = None) -> 'ZNormalizer':
def _fit(self, X: np.ndarray, y: Optional[np.ndarray] = None) -> 'StandardScaler':
if len(X.shape) == 1 or X.shape[1] == 1:
# univariate case
self.mean_ = np.array([np.nanmean(X)])
Expand All @@ -62,7 +62,7 @@ def _transform(self, X: np.ndarray, y: Optional[np.ndarray] = None) -> Tuple[np.
if not (hasattr(self, 'mean_') and hasattr(self, 'std_')):
raise NotFittedError(f'Call `fit` before using transform on {str(self)}')
if not ((len(X.shape) == 1 and self.mean_.shape[0] == 1) or X.shape[1] == self.mean_.shape[0]):
raise AttributeError(f'Trying to z-normalize a time series with {X.shape[0]} attributes while it was fitted on {self.min_.shape[0]} attributes!')
raise AttributeError(f'Trying to standard scale a time series with {X.shape[0]} attributes while it was fitted on {self.mean_.shape[0]} attributes!')

# If the std of all attributes is 0, then no transformation happens
if np.all((self.std_ < self.min_std)):
Expand Down
8 changes: 5 additions & 3 deletions dtaianomaly/preprocessing/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,26 @@
from .Preprocessor import Preprocessor, check_preprocessing_inputs, Identity
from .ChainedPreprocessor import ChainedPreprocessor
from .MinMaxScaler import MinMaxScaler
from .ZNormalizer import ZNormalizer
from .StandardScaler import StandardScaler
from .MovingAverage import MovingAverage
from .ExponentialMovingAverage import ExponentialMovingAverage
from .UnderSampler import SamplingRateUnderSampler, NbSamplesUnderSampler
from .Differencing import Differencing
from .PiecewiseAggregateApproximation import PiecewiseAggregateApproximation
from .RobustScaler import RobustScaler

__all__ = [
'Preprocessor',
'check_preprocessing_inputs',
'Identity',
'ChainedPreprocessor',
'MinMaxScaler',
'ZNormalizer',
'StandardScaler',
'MovingAverage',
'ExponentialMovingAverage',
'SamplingRateUnderSampler',
'NbSamplesUnderSampler',
'Differencing',
'PiecewiseAggregateApproximation'
'PiecewiseAggregateApproximation',
'RobustScaler'
]
7 changes: 5 additions & 2 deletions dtaianomaly/workflow/workflow_from_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -345,10 +345,10 @@ def preprocessing_entry(entry):
raise TypeError(f'Too many parameters given for entry: {entry}')
return preprocessing.MinMaxScaler()

elif processing_type == 'ZNormalizer':
elif processing_type == 'StandardScaler':
if len(entry_without_type) > 0:
raise TypeError(f'Too many parameters given for entry: {entry}')
return preprocessing.ZNormalizer()
return preprocessing.StandardScaler()

elif processing_type == 'MovingAverage':
return preprocessing.MovingAverage(**entry_without_type)
Expand All @@ -368,6 +368,9 @@ def preprocessing_entry(entry):
elif processing_type == 'PiecewiseAggregateApproximation':
return preprocessing.PiecewiseAggregateApproximation(**entry_without_type)

elif processing_type == 'RobustScaler':
return preprocessing.RobustScaler(**entry_without_type)

elif processing_type == 'ChainedPreprocessor':
if len(entry_without_type) != 1:
raise TypeError(f'ChainedPreprocessor must have base_preprocessors as key: {entry}')
Expand Down
6 changes: 3 additions & 3 deletions notebooks/Config.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,14 @@
],
"preprocessors": [
{"type": "Identity"},
{"type": "ZNormalizer"},
{"type": "StandardScaler"},
{"type": "ChainedPreprocessor", "base_preprocessors": [
{"type": "MovingAverage", "window_size": 10},
{"type": "ZNormalizer"}
{"type": "StandardScaler"}
]},
{"type": "ChainedPreprocessor", "base_preprocessors": [
{"type": "ExponentialMovingAverage", "alpha": 0.8},
{"type": "ZNormalizer"}
{"type": "StandardScaler"}
]}
],
"detectors": [
Expand Down
2 changes: 1 addition & 1 deletion notebooks/Industrial-anomaly-detection.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@
"source": [
"##### (2) Preprocessors\n",
"\n",
"Next, we can define zero, one or multiple preprocessors to process the data. ``dtaianomaly`` already offers a number of preprocessors, (e.g., ``MinMaxScaler``, ``ZNormalizer``, ``MovingAverage``, ``ChainedPreprocessor``, etc.), but it is also possible to develop a custom preprocessor. For example, the wind turbine data has missing values, which typically cannot be handled by anomaly detectors. To cope with these, we define an ``Imputer`` preprocessor as below. All we need to do for this is add ``Preprocessor`` as a parent of the class and implement the ``._fit()`` and ``._transform()`` methods. For the ``Imputer``, no fitting is required, and the missing values are replaced by the previous observed value. Note that more complex imputation strategies could be implemented as well. "
"Next, we can define zero, one or multiple preprocessors to process the data. ``dtaianomaly`` already offers a number of preprocessors, (e.g., ``MinMaxScaler``, ``StandardScaler``, ``MovingAverage``, ``ChainedPreprocessor``, etc.), but it is also possible to develop a custom preprocessor. For example, the wind turbine data has missing values, which typically cannot be handled by anomaly detectors. To cope with these, we define an ``Imputer`` preprocessor as below. All we need to do for this is add ``Preprocessor`` as a parent of the class and implement the ``._fit()`` and ``._transform()`` methods. For the ``Imputer``, no fitting is required, and the missing values are replaced by the previous observed value. Note that more complex imputation strategies could be implemented as well. "
],
"id": "6d23ff059c6f7832"
},
Expand Down
Loading

0 comments on commit 5d438ab

Please sign in to comment.