Skip to content

Commit

Permalink
Merge pull request #179 from chhoumann/background-fix-transformations
Browse files Browse the repository at this point in the history
Consistent notation for transformations
  • Loading branch information
Pattrigue authored Jun 5, 2024
2 parents 67e9f5a + 47d6a73 commit e492078
Showing 1 changed file with 14 additions and 12 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,33 @@ \subsubsection{Power Transformation}
They are particularly useful in statistical modeling and data analysis to meet the assumptions of linear models.

One of the first influential power transformation techniques is the Box-Cox power transform, introduced by \citet{BoxAndCox} in 1964.
This is defined for positive data and is aimed at normalizing data or making it more symmetric. The transformation is given by:
This is defined for positive data and is aimed at normalizing data or making it more symmetric.
For a feature vector $\mathbf{x}$, the Box-Cox transformation is defined as:

$$
\text{BC}(\lambda, x) =
\psi^{\text{BC}}(\lambda, \mathbf{x}) =
\begin{cases}
\frac{x^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\
\log(x) & \text{if } \lambda = 0
\end{cases}
\frac{\mathbf{x}^\lambda - 1}{\lambda}, & (\lambda \neq 0) \\
\log(\mathbf{x}), & (\lambda = 0)
\end{cases},
$$
where $ \lambda $ is the transformation parameter and $x$ is the input data.

where $\lambda$ is the transformation parameter.
$\lambda$ determines the extend and nature of the transformation, where positive values of $\lambda$ apply a power transformation and $\lambda = 0$ applies a logarithmic transformation.

To overcome the limitations of the Box-Cox transformation, \citet{YeoJohnson} introduced a new family of power transformations that can handle both positive and negative values.
The Yeo-Johnson power transformation is defined as:

$$
y =
\psi(\lambda, \mathbf{x}) =
\begin{cases}
\frac{((x + 1)^\lambda - 1)}{\lambda} & \text{for } x \geq 0, \lambda \neq 0 \\
\log(x + 1) & \text{for } x \geq 0, \lambda = 0 \\
-\frac{((-x + 1)^{2 - \lambda} - 1)}{2 - \lambda} & \text{for } x < 0, \lambda \neq 2 \\
-\log(-x + 1) & \text{for } x < 0, \lambda = 2
\frac{(\mathbf{x} + 1)^\lambda - 1}{\lambda} & (\mathbf{x} \geq 0, \lambda \neq 0) \\
\log(\mathbf{x} + 1) & (\mathbf{x} \geq 0, \lambda = 0) \\
- \frac{(-\mathbf{x} + 1)^{2 - \lambda} - 1}{2 - \lambda} & (\mathbf{x} < 0, \lambda \neq 2) \\
-\log(-\mathbf{x} + 1) & (\mathbf{x} < 0, \lambda = 2)
\end{cases}
$$
where $x$ is the input data, $y$ is the transformed data, and $\lambda$ is the transformation parameter.

For non-negative values, the Yeo-Johnson transformation simplifies to the Box-Cox transformation, making them equivalent in this context.
The key benefit of the Yeo-Johnson transformation is its ability to handle any real number, making it a robust choice for transforming data to achieve approximate normality or symmetry.
This property is particularly beneficial for preparing data for statistical analyses and machine learning models that require normally distributed input data.

0 comments on commit e492078

Please sign in to comment.