HoeffdingD.jl implements in pure Julia the Hoeffding measure of dependence as described in the original paper: A Non-Parametric Test of Independence in particular chapter 5. The package also implements the D-test of independence described in chapter 9 of the same paper.
The advantage of this statistic is to detect nonlinear relationships that Pearson's correlation or Spearman's rank correlation are unable to detect. The disadvantages are that:
- the computation is taking more time than Pearson's Correlation.
- the Dependence value is impacted by identical observations and by the order of the observations
Enter the Pkg REPL by pressing ]
from the Julia REPL. Then install the package with: pkg> add https://github.com/ericqu/HoeffdingD
Here we demonstrate the classic example of detecting linear and quadratic relationships with Hoeffding measure contrasted with Perason Correlation and Spearman's rank correlation.
x = -2:0.1:2
linear_f(x) = 2x ; quad_f(x) = x^2
y_linear = linear_f.(x)
y_quad = quad_f.(x)
which can be displayed as:
using Plots
scatter(x, y_linear, label="linear")
scatter!(x, y_quad, label="quadratic", legend=:bottomright)
savefig("docs/linquad.png")
using StatsBase
#Pearson Correlation
@show(StatsBase.cor(x, y_linear))
@show(StatsBase.cor(x, y_quad))
#Spearman's rank correlation
@show(StatsBase.corspearman(x, y_linear))
@show(StatsBase.corspearman(x, y_quad))
which gives
StatsBase.cor(x, y_linear) = 1.0
StatsBase.cor(x, y_quad) = 0.0
StatsBase.corspearman(x, y_linear) = 1.0
StatsBase.corspearman(x, y_quad) = 0.0
Both concurring that there is no correlation with the quadratic relationship.
This can be contrasted with the Hoeffding measure:
@show(HoeffdingD.hoeffdingd(x, y_linear))
@show(HoeffdingD.hoeffdingd(x, y_quad))
gives:
HoeffdingD.hoeffdingd(x, y_linear) = 1.0
HoeffdingD.hoeffdingd(x, y_quad) = 0.2183712793468891
Which like Pearson Correlation and Spearman rank correlation indicates a perfect relationship for the linear relationship. For the quadratic relationship the D (dependence) value appears to be non-zero, but can the independence be ruled out? For that purpose, we need to provide an α (between 0 and 1).
@show(HoeffdingD.hoeffdingd(x, y_linear, 0.05))
@show(HoeffdingD.hoeffdingd(x, y_quad, 0.05))
Which gives:
HoeffdingD.hoeffdingd(x, y_linear, 0.05) = (1.0, true)
HoeffdingD.hoeffdingd(x, y_quad, 0.05) = (0.2183712793468891, true)
The function now returns the D value with the result of the D-test of independence for the given α. Therefore indicating that in both cases we can reject the H₀ hypothesis of independence.
Please post your questions or issues in the Issues tabs.