-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #178 from chhoumann/background-fix-scalers
Fix notation, formulas and cites in scalars in Background section
- Loading branch information
Showing
6 changed files
with
28 additions
and
20 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 7 additions & 5 deletions
12
report_thesis/src/sections/background/preprocessing/min-max.tex
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,12 @@ | ||
\subsubsection{Min-Max Normalization}\label{subsec:min-max} | ||
Min-max normalization rescales the range of features to $[0, 1]$ or $[a, b]$, where $a$ and $b$ represent the new minimum and maximum values, respectively. | ||
Min-max normalization rescales the range of features to a specific range $[a, b]$, where $a$ and $b$ represent the new minimum and maximum values, respectively. | ||
The goal is to normalize the range of the data to a specific scale, typically 0 to 1. | ||
Mathematically, min-max normalization is defined as: | ||
The min-max normalization of a feature vector $\mathbf{x}$ is given by: | ||
|
||
$$ | ||
v' = \frac{v - \min(F)}{\max(F) - \min(F)} \times (b - a) + a, | ||
x'_i = \frac{x_i - \min(\mathbf{x})}{\max(\mathbf{x}) - \min(\mathbf{x})}(b - a) + a, | ||
$$ | ||
where $v$ is the original value, $\min(F)$ and $\max(F)$ are the minimum and maximum values of the feature $F$, respectively. | ||
|
||
This type of normalization is beneficial because it ensures that each feature contributes equally to the analysis, regardless of its original scale. | ||
where $x_i$ is the original value, $\min(\mathbf{x})$ and $\max(\mathbf{x})$ are the minimum and maximum values of the feature vector $\mathbf{x}$, respectively, and $x'_i$ is the normalized feature value. | ||
|
||
This type of normalization is beneficial because it ensures that each feature contributes equally to the analysis, regardless of its original scale~\cite{dataminingConcepts}. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 changes: 5 additions & 3 deletions
8
report_thesis/src/sections/background/preprocessing/robust_scaler.tex
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,10 @@ | ||
\subsubsection{Robust Scaler} | ||
The robust scaler is a normalization technique that removes the median and scales the data according to the quantile range. | ||
The formula for the robust scaler is given by: | ||
The robust scaler of a feature vector $\mathbf{x}$ is given by: | ||
|
||
$$ | ||
X_{\text{scaled}} = \frac{X - \text{Q1}(X)}{\text{Q3}(X) - \text{Q1}(X)} \: , | ||
x'_i = \frac{x_i - \text{Q1}(\mathbf{x})}{\text{Q3}(\mathbf{x}) - \text{Q1}(\mathbf{x})} \: , | ||
$$ | ||
where $X$ is the original data, $\text{Q1}(X)$ is the first quartile of $X$, and $\text{Q3}(X)$ is the third quartile of $X$. | ||
|
||
where $x_i$ is the original feature value, $\text{Q1}(\mathbf{x})$ is the first quartile of the feature vector $\mathbf{x}$, and $\text{Q3}(\mathbf{x})$ is the third quartile of the feature vector $\mathbf{x}$. | ||
This technique can be advantageous in cases where the data contains outliers, as it relies on the median and quantile range instead of the mean and variance, both of which are sensitive to outliers~\cite{Vasques2024}. |
10 changes: 5 additions & 5 deletions
10
report_thesis/src/sections/background/preprocessing/z-score.tex
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,12 +1,12 @@ | ||
\subsubsection{Z-score Normalization} | ||
Z-score normalization, also standardization, transforms data to have a mean of zero and a standard deviation of one. | ||
Z-score normalization, also known as zero-mean normalization, transforms data to have a mean of zero and a standard deviation of one. | ||
This technique is useful when the actual minimum and maximum of a feature are unknown or when outliers may significantly skew the distribution. | ||
The formula for Z-score normalization is given by: | ||
The z-score normalization of a feature vector \(\mathbf{x}\) is given by: | ||
|
||
$$ | ||
v' = \frac{v - \overline{F}}{\sigma_F}, | ||
x'_i = \frac{x_i - \overline{\mathbf{x}}}{\sigma_\mathbf{x}}, | ||
$$ | ||
|
||
where $v$ is the original value, $\overline{F}$ is the mean of the feature $F$, and $\sigma_F$ is the standard deviation of $F$. | ||
where \(x_i\) is the original value, \(\overline{\mathbf{x}}\) is the mean of the feature vector \(\mathbf{x}\), \(\sigma_\mathbf{x}\) is the standard deviation of the feature vector \(\mathbf{x}\), and \(x'_i\) is the normalized feature value. | ||
By transforming the data using the Z-score, each value reflects its distance from the mean in terms of standard deviations. | ||
Z-score normalization is particularly advantageous in scenarios where data features have different units or scales, or when preparing data for algorithms that assume normally distributed inputs~\cite{dataminingConcepts}. | ||
Z-score normalization is particularly advantageous in scenarios where data features have different units or scales, or when preparing data for algorithms that assume normally distributed inputs~\cite{dataminingConcepts}. |