Add Generalized chi-squared distribution #1806

heliosdrm · 2023-11-27T22:55:36Z

This extends the collection of univariate continuous distributions with the Generalized chi-squared. All required methods are added, and some of the recommended ones, as indicated in https://juliastats.org/Distributions.jl/stable/extends/#Univariate-Distribution

Some notes:

I started considering Abhranil Das' code for Matlab (MIT-licensed) as reference, although progressively departed from it, so that eventually there is little in the Julia code that can be attributed to him. I have kept a comment where I mention the inspiration to define the cdf function, using Davies' algorithm to integrate the characteristic function. But I don't know if a more explicit attribution in the code or in the license should be made.
There are several auxiliary functions that should not be part of the API, and I have encapsulated all of them in the module GChisqComputations, after the example of the Chernoff distribution.
That module includes two functions to calculate the characteristic function: cf_explicit that implements the explicit formula, and cf_inherit that composes the results of cf(::Normal) and cf(::NoncentralChisq). Both are equivalent, and the code of the second is clearer, but cf(::GeneralizedChisq) calls the first one, because in a few tests that I have done, it is slightly (but not much) faster.
cdf and pdf are calculated by integration, using the Gil-Pelaez theorem (the algorithm for the CDF is given in Davies' paper, and the algorithm for the PDF has been indirectly derived from it). The integration is done using QuadGK.jl, which was already in the dependencies of Distributions.jl. The default tolerances of QuadGK.quadgk made the calculation of cdf at the median very slow, because there the integral is zero. Since in the ends of the distribution the integral is ±π/2, which is in the order of magnitude of the unit, I have used a fixed absolute tolerance of eps(one(T)) for the integration in the calculation of the CDF. I have not, however, changed the defaults for the calculation of the PDF.
quantile is calculated using Distributions.quantile_newton. However, there is no analytic solution to find the mode, which is the recommended starting point, so this is searched using the following strategy: (1) start at the mean of the distribution, and use it if it meets the convergence criteria; (2) otherwise define a bracket between that point and another at one standard deviation from the mean, in the direction towards the target probability (and if the bracket does not contain the target probability, extend it until it does); (3) bisect the bracket until one of the ends meets the convergence criteria, and then use that as the starting point. This seems to work fine, but it's a self-made algorithm, and perhaps there is a more efficient one that might be used instead.

Update master

* add `params` method * remove debug code

codecov-commenter · 2023-11-28T10:57:22Z

Codecov Report

Attention: Patch coverage is 96.71053% with 5 lines in your changes missing coverage. Please review.

Project coverage is 86.37%. Comparing base (1e6801d) to head (0bd3aca).

Files with missing lines	Patch %	Lines
src/univariate/continuous/generalizedchisq.jl	96.71%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1806      +/-   ##
==========================================
+ Coverage   86.19%   86.37%   +0.17%     
==========================================
  Files         146      147       +1     
  Lines        8768     8920     +152     
==========================================
+ Hits         7558     7705     +147     
- Misses       1210     1215       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mahaa2 · 2024-03-16T15:22:55Z

When this will be added to the main branch ?

mahaa2 · 2024-03-25T08:23:29Z

Hi,

I have played around with this extension.
As a Statistician, I strongly believe it is a very useful and very important distribution to have in main branch of the package Distribution.jl.

In general it seems to work very well and I will use for future research.

However, I notice that while doing my own tests the quadgk.jl package/function used inside the file generalizedchisq.jl
appears to fail accusing the error (line 260 of that file)

"
DomainError with 0.9999999999999964:
integrand produced NaN in the interval (0.9999999999999929, 1.0)
" ,

in the following explained cases.

Lets say that X ~ N_(µ, Σ) with particular vector dimension D = 2, 3, 4.
Then we now know that Y = q0 + q1'X + X'QX ~GChisq(w, ν, λ, m, s)
(the notation in the GChisq package is different, but also irrelevant here).

The parameter values (w, ν, λ, m, s) = T(q0, q1, Q, µ, Σ) are obtained accordingly with the transformation, say T,
following the paper (I present below my own code to replicate the experiments in Julia).

One can just run the below code and check the results. The error also seems to be exactly the same one appearing in here.

mutable struct parGChisq
    w::Vector{<: Real}
    ν::Vector{<: Real}
    λ::Vector{<: Real}
    m::Real
    s::Real
end

function mapGChisqpar(
    q0::Real,
    q1::Vector{<: Real},
    Q ::AbstractMatrix,
    µ ::Vector{<: Real},
    Σ ::AbstractMatrix,
    ch::Bool = false)::parGChisq
    
    """
     if   X ~ N(µ, Σ)
     then Υ = q0 + q1' X + X'QX ~ GChisq(w, ν, λ, m, s)
     
     see https://arxiv.org/pdf/2012.14331.pdf

     the option with ch = true
     does not work. why ? 
    """ 

    # check dimensionality match
    dimQ = size(Q, 1)
    dimq = length(q1)
    dimΣ = size(Σ, 1)
    dimµ = length(µ)

    dimQ !== dimq ? @error("dims mismatch Q, q1") : nothing
    dimΣ !== dimq ? @error("dims mismatch Σ, q1") : nothing
    dimΣ !== dimµ ? @error("dims mismatch Σ, µ") : nothing

    # check matrix conditions
    !issymmetric(Q) ? @error("Q is not symmetric") : nothing
    !isposdef(Σ)    ? @error("Σ is not PD") : nothing

    # take the square root matrix 
    S = !ch ? sqrt(Σ) : cholesky(Σ).L

    # standartize the space
    t_Q² = !ch ? Symmetric(S * Q * S) : Symmetric(S * Q * S')
    t_q1 = S * (2.0 * Q * µ + q1)
    t_q0 = q0 + dot(q1, µ) + dot(µ, Q, µ) 

    # get the generalized Chi-squared parameters
    d, R = eigen(t_Q²)
    d .= trunc.(d, digits = 9)
    # d[abs.(d) .< eps()] .= 0.0

    # compute b terms
    b = R' * t_q1

    # unique nonzeros eigenvalues
    # the unique function returns an ordered set
    iz = d .!= 0.0
    w  = unique(d[iz])
    ew = !isempty(w) 

    # warn if there are only zero eigenvalues
    ew ? nothing : @warn("Y will follow a Gaussian distribution") 

    # total dof of each nonzero eigenvalue
    ν = ew ? [sum(d .== i) for i in w] : Float64[]

    # total non-centrality for each nonzero eigenvalue
    λ = ew ? [sum(b[d .== x].^2.0) for x in w] ./ (4.0.*w.^2.0) : Float64[]
    
    # Gaussian parameters
    m = t_q0 - dot(w, λ)
    s = norm(b[.!iz]) 

    return parGChisq(w, ν, λ, m, s)
end


using Distributions, 
      LinearAlgebra,
      Plots


# tests
# dimension
D = 3 # 2 or 4

# set parameter of the const + linear + quadratic form
q0 = randn()
q1 = rand(D)
Q  = Diagonal(randn(D))
µ  = randn(D)
Σ  = rand(Wishart(D, Matrix(I, D, D)), 1)[];

# set the correspondent parameters of the GChisq
th = mapGChisqpar(q0, q1, Q, µ, Σ)

# set the GChisq distribution
π = GeneralizedChisq(th.w, th.ν, th.λ, th.m, th.s)

# test
@time pdf(π, mean(π))

AdaemmerP · 2024-11-09T09:25:17Z

Adding the generalized chi-squared distribution would be highly appreciated!

heliosdrm · 2024-11-10T07:47:11Z

If there is anything blocking the merging of this PR that could be solved, I'd love to help. Regarding the issue reported by @mahaa2, maybe it is related to what I had mentioned:

The default tolerances of QuadGK.quadgk made the calculation of cdf at the median very slow, because there the integral is zero. Since in the ends of the distribution the integral is ±π/2, which is in the order of magnitude of the unit, I have used a fixed absolute tolerance of eps(one(T)) for the integration in the calculation of the CDF. I have not, however, changed the defaults for the calculation of the PDF.

I see that this problem also happens in other functions of the package, as kldivergence (#1443), and in that case @devmotion comments that it's "safer to throw an error than to return a possibly incorrect value". So I'm not sure what should be done in this case: (a) set another tolerance default for the PDF, (b) document the issue around median values, (c) devise a workaround to change the starting point of the integration if this situation is detected, or (d) just leave it as it is.

Let aside that, there were some CI errors for nightly, but those also happened to other unrelated PR when this one was submitted, so I think they are not related to real issues in the code.

heliosdrm · 2024-12-23T12:37:03Z

Hi. Any suggestion about what might be done to improve this PR and get it merged? Thank you!

heliosdrm · 2025-01-22T11:46:05Z

I have updated the branch merging with current main. The new test failure with Julia 1.3 - macos seems unrelated to this PR. Besides I have tested the code of the error reported by @mahaa2 , code and I cannot reproduce it, it just works.

Is there anything else blocking this?

heliosdrm and others added 7 commits November 13, 2023 22:20

Merge pull request #1 from JuliaStats/master

fa4d903

Update master

add generalized chi squared distribution

8c4296d

clean generalizedchisq.jl

813fee4

optimize quantile(::GeneralizedChisq, ...)

63f0ccb

working code and tests of GeneralizedChisq

9632739

minor fixes and docs

ee71d8e

* add `params` method * remove debug code

fix type in runtests.jl

251c061

test the computation of Davies' terms

3ab5b06

heliosdrm and others added 2 commits January 22, 2025 11:23

Merge branch 'JuliaStats:master' into master

c6442a6

Merge remote-tracking branch 'origin/master' into GChisq

0bd3aca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Generalized chi-squared distribution #1806

Add Generalized chi-squared distribution #1806

heliosdrm commented Nov 27, 2023

codecov-commenter commented Nov 28, 2023 •

edited

Loading

mahaa2 commented Mar 16, 2024 •

edited

Loading

mahaa2 commented Mar 25, 2024 •

edited

Loading

AdaemmerP commented Nov 9, 2024

heliosdrm commented Nov 10, 2024

heliosdrm commented Dec 23, 2024

heliosdrm commented Jan 22, 2025 •

edited

Loading

Add Generalized chi-squared distribution #1806

Are you sure you want to change the base?

Add Generalized chi-squared distribution #1806

Conversation

heliosdrm commented Nov 27, 2023

codecov-commenter commented Nov 28, 2023 • edited Loading

Codecov Report

mahaa2 commented Mar 16, 2024 • edited Loading

mahaa2 commented Mar 25, 2024 • edited Loading

AdaemmerP commented Nov 9, 2024

heliosdrm commented Nov 10, 2024

heliosdrm commented Dec 23, 2024

heliosdrm commented Jan 22, 2025 • edited Loading

codecov-commenter commented Nov 28, 2023 •

edited

Loading

mahaa2 commented Mar 16, 2024 •

edited

Loading

mahaa2 commented Mar 25, 2024 •

edited

Loading

heliosdrm commented Jan 22, 2025 •

edited

Loading