Skip to content

HoeffdingD.jl implements in pure Julia the Hoeffding measure of dependence.

License

Notifications You must be signed in to change notification settings

ericqu/HoeffdingD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HoeffdingD.jl

HoeffdingD.jl implements in pure Julia the Hoeffding measure of dependence as described in the original paper: A Non-Parametric Test of Independence in particular chapter 5. The package also implements the D-test of independence described in chapter 9 of the same paper.

The advantage of this statistic is to detect nonlinear relationships that Pearson's correlation or Spearman's rank correlation are unable to detect. The disadvantages are that:

  • the computation is taking more time than Pearson's Correlation.
  • the Dependence value is impacted by identical observations and by the order of the observations

Installation

Enter the Pkg REPL by pressing ] from the Julia REPL. Then install the package with: pkg> add https://github.com/ericqu/HoeffdingD

Usage

Here we demonstrate the classic example of detecting linear and quadratic relationships with Hoeffding measure contrasted with Perason Correlation and Spearman's rank correlation.

Data generation

x = -2:0.1:2
linear_f(x) = 2x ; quad_f(x) = x^2
y_linear = linear_f.(x)
y_quad = quad_f.(x)

which can be displayed as:

using Plots
scatter(x, y_linear, label="linear")
scatter!(x, y_quad, label="quadratic", legend=:bottomright)
savefig("docs/linquad.png")

scatterplot linear quadratic

Classic tests

using StatsBase

#Pearson Correlation
@show(StatsBase.cor(x, y_linear))
@show(StatsBase.cor(x, y_quad))
#Spearman's rank correlation
@show(StatsBase.corspearman(x, y_linear))
@show(StatsBase.corspearman(x, y_quad))

which gives

StatsBase.cor(x, y_linear) = 1.0
StatsBase.cor(x, y_quad) = 0.0
StatsBase.corspearman(x, y_linear) = 1.0
StatsBase.corspearman(x, y_quad) = 0.0

Both concurring that there is no correlation with the quadratic relationship.

Hoeffding D measure and D-test

This can be contrasted with the Hoeffding measure:

@show(HoeffdingD.hoeffdingd(x, y_linear))
@show(HoeffdingD.hoeffdingd(x, y_quad))

gives:

HoeffdingD.hoeffdingd(x, y_linear) = 1.0
HoeffdingD.hoeffdingd(x, y_quad) = 0.2183712793468891

Which like Pearson Correlation and Spearman rank correlation indicates a perfect relationship for the linear relationship. For the quadratic relationship the D (dependence) value appears to be non-zero, but can the independence be ruled out? For that purpose, we need to provide an α (between 0 and 1).

@show(HoeffdingD.hoeffdingd(x, y_linear, 0.05))
@show(HoeffdingD.hoeffdingd(x, y_quad, 0.05))

Which gives:

HoeffdingD.hoeffdingd(x, y_linear, 0.05) = (1.0, true)
HoeffdingD.hoeffdingd(x, y_quad, 0.05) = (0.2183712793468891, true)

The function now returns the D value with the result of the D-test of independence for the given α. Therefore indicating that in both cases we can reject the H₀ hypothesis of independence.

Questions

Please post your questions or issues in the Issues tabs.

About

HoeffdingD.jl implements in pure Julia the Hoeffding measure of dependence.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages