Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ValueSupport to allow non-integer discrete support #941

Closed
wants to merge 36 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
7f354a2
Implement Dirac distribution
aplavin May 4, 2019
5dc860d
Merge branch 'master' into patch-1
aplavin May 18, 2019
d6503c4
Float return values instead of Int
aplavin May 18, 2019
d438ccc
Remove the restriction that the support values of a DiscreteNonParame…
DilumAluthge Jun 18, 2019
caed648
Merge branch 'da/discrete-non-parametric' of https://github.com/Dilum…
richardreeve Jul 25, 2019
2afcb05
ValueSupport is now parameterised by the eltype of its support, and i…
richardreeve Jul 26, 2019
5965616
Fix univariates and multivariates to use new ValueSupport - especiall…
richardreeve Jul 26, 2019
e50262f
Update testing.
richardreeve Jul 26, 2019
31051b8
Fixing log[c]cdf to work with DiscreteNonParametric.
richardreeve Jul 26, 2019
61f8114
Merge branch 'patch-1' of https://github.com/aplavin/Distributions.jl…
richardreeve Jul 28, 2019
335766f
Fixing function signatures, especially for quantile(), which can take…
richardreeve Jul 28, 2019
4783b13
Categorical can support any integer.
richardreeve Jul 28, 2019
d73a229
Fixing eltypes for Categorical and Dirac types.
richardreeve Jul 29, 2019
a9359ec
Fixing Dirac and DiscreteNonParametric
richardreeve Jul 30, 2019
fe08381
Minor ValueSupport fixes
richardreeve Jul 30, 2019
a4eceb6
Add in AbstractMixtureDistribution as supertype with AbstractMixtureM…
richardreeve Jul 30, 2019
88de1b2
Add in ZeroInflated and Hurdle distributions as special cases of Comp…
richardreeve Jul 30, 2019
67dca49
Merge branch 'master' into rr/support
richardreeve Jul 30, 2019
48970ea
Revert whitespace fix.
richardreeve Jul 30, 2019
20872be
New type hierarchy for ValueSupport to allow non-Int, non-Float64 elt…
richardreeve Jul 30, 2019
c6173e5
Simplify variate_form() signature
richardreeve Jul 31, 2019
5a399cc
Refine eltype() signature
richardreeve Jul 31, 2019
0d99f67
Incorporate existing distributions with different eltypes into new fr…
richardreeve Jul 31, 2019
492c090
Merge branch 'rr/countable' into rr/support
richardreeve Jul 31, 2019
f4e5156
Remove NonMatrixDistribution
richardreeve Jul 31, 2019
f8e9918
Add in some more tests for new code.
richardreeve Jul 31, 2019
6642ae9
Merge branch 'master' into rr/countable
richardreeve Jul 31, 2019
42477c6
Updating nsamples() and adding testing.
richardreeve Aug 1, 2019
6c1c76e
Remove 0.7 on appveyor.
richardreeve Aug 1, 2019
8c6e326
Merge branch 'rr/countable' into rr/support
richardreeve Aug 1, 2019
4b64208
Update readme to mention pmf.
richardreeve Aug 1, 2019
bb9c2ac
Merge branch 'master' into rr/countable
richardreeve Aug 1, 2019
acad538
Merge branch 'master' into rr/countable
richardreeve Aug 2, 2019
0c148ef
Add in a specific subtype of countable support - ContiguousSupport - …
richardreeve Aug 3, 2019
d42c02a
Fix non-ContiguousSupport distributions.
richardreeve Aug 3, 2019
45809a4
Merge branch 'rr/countable' into rr/support
richardreeve Aug 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Distributions.jl
A Julia package for probability distributions and associated functions. Particularly, *Distributions* implements:

* Moments (e.g mean, variance, skewness, and kurtosis), entropy, and other properties
* Probability density/mass functions (pdf) and their logarithm (logpdf)
* Probability density/mass functions (pdf/pmf) and their logarithm (logpdf/logpmf)
* Moment generating functions and characteristic functions
* Sampling from population or from a distribution
* Maximum likelihood estimation
Expand Down
30 changes: 19 additions & 11 deletions docs/src/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,10 @@ The `ValueSupport` sub-types defined in `Distributions.jl` are:

**Type** | **Element type** | **Descriptions**
--- | --- | ---
`Discrete` | `Int` | Samples take discrete values
`Continuous` | `Float64` | Samples take continuous real values
`DiscreteSupport{T}` | `T` | Samples take any discrete values
`Discrete = DiscreteSupport{Int}` | `Int` | Samples take `Int` values
`ContinuousSupport{T <: Number}` | `T` | Samples take continuous values
`Continuous = ContinuousSupport{Float64}` | `Float64` | Samples take continuous `Float64` values

Multiple samples are often organized into an array, depending on the variate form.

Expand All @@ -69,22 +71,28 @@ abstract type Distribution{F<:VariateForm,S<:ValueSupport} <: Sampleable{F,S} en
Distributions.Distribution
```

To simplify the use in practice, we introduce a series of type alias as follows:
To simplify the use in practice, we introduce a series of type aliases as follows:
```julia
const UnivariateDistribution{S<:ValueSupport} = Distribution{Univariate,S}
const MultivariateDistribution{S<:ValueSupport} = Distribution{Multivariate,S}
const MatrixDistribution{S<:ValueSupport} = Distribution{Matrixvariate,S}
const NonMatrixDistribution = Union{UnivariateDistribution, MultivariateDistribution}

const DiscreteDistribution{F<:VariateForm} = Distribution{F,Discrete}
const ContinuousDistribution{F<:VariateForm} = Distribution{F,Continuous}
const CountableDistribution{F<:VariateForm, C<:CountableSupport} = Distribution{F,C}
const DiscreteDistribution{F<:VariateForm} = CountableDistribution{F,Discrete}
const ContinuousDistribution{F<:VariateForm} = Distribution{F,Continuous}

const DiscreteUnivariateDistribution = Distribution{Univariate, Discrete}
const ContinuousUnivariateDistribution = Distribution{Univariate, Continuous}
const DiscreteMultivariateDistribution = Distribution{Multivariate, Discrete}
const ContinuousMultivariateDistribution = Distribution{Multivariate, Continuous}
const DiscreteMatrixDistribution = Distribution{Matrixvariate, Discrete}
const ContinuousMatrixDistribution = Distribution{Matrixvariate, Continuous}
const CountableUnivariateDistribution{C<:CountableSupport} = UnivariateDistribution{C}
const DiscreteUnivariateDistribution = CountableUnivariateDistribution{Discrete}
const ContinuousUnivariateDistribution = UnivariateDistribution{Continuous}

const CountableMultivariateDistribution{C<:CountableSupport} = MultivariateDistribution{C}
const DiscreteMultivariateDistribution = CountableMultivariateDistribution{Discrete}
const ContinuousMultivariateDistribution = MultivariateDistribution{Continuous}

const CountableMatrixDistribution{C<:CountableSupport} = MatrixDistribution{C}
const DiscreteMatrixDistribution = CountableMatrixDistribution{Discrete}
const ContinuousMatrixDistribution = MatrixDistribution{Continuous}
```

All methods applicable to `Sampleable` also applies to `Distribution`. The API for distributions of different variate forms are different (refer to [univariates](@ref univariates), [multivariates](@ref multivariates), and [matrix](@ref matrix-variates) for details).
43 changes: 33 additions & 10 deletions src/Distributions.jl
Original file line number Diff line number Diff line change
Expand Up @@ -30,20 +30,28 @@ export
# generic types
VariateForm,
ValueSupport,
CountableSupport,
ContiguousSupport,
ContinuousSupport,
DiscontinuousSupport,
UnionSupport,
Univariate,
Multivariate,
Matrixvariate,
Discrete,
Continuous,
Discontinuous,
Sampleable,
Distribution,
UnivariateDistribution,
MultivariateDistribution,
MatrixDistribution,
NoncentralHypergeometric,
NonMatrixDistribution,
DiscreteDistribution,
ContinuousDistribution,
CountableUnivariateDistribution,
CountableMultivariateDistribution,
CountableMatrixDistribution,
DiscreteUnivariateDistribution,
DiscreteMultivariateDistribution,
DiscreteMatrixDistribution,
Expand All @@ -69,9 +77,11 @@ export
Chernoff,
Chi,
Chisq,
CompoundDistribution,
Cosine,
DiagNormal,
DiagNormalCanon,
Dirac,
Dirichlet,
DirichletMultinomial,
DiscreteUniform,
Expand All @@ -93,6 +103,7 @@ export
GeneralizedExtremeValue,
Geometric,
Gumbel,
Hurdle,
Hypergeometric,
InverseWishart,
InverseGamma,
Expand Down Expand Up @@ -138,6 +149,7 @@ export
Rayleigh,
Semicircle,
Skellam,
SpikeSlab,
StudentizedRange,
SymTriangularDist,
TDist,
Expand All @@ -152,6 +164,7 @@ export
WalleniusNoncentralHypergeometric,
Weibull,
Wishart,
ZeroInflated,
ZeroMeanIsoNormal,
ZeroMeanIsoNormalCanon,
ZeroMeanDiagNormal,
Expand All @@ -172,6 +185,8 @@ export
components, # get components from a mixture model
componentwise_pdf, # component-wise pdf for mixture models
componentwise_logpdf, # component-wise logpdf for mixture models
componentwise_pmf, # component-wise pmf for mixture models
componentwise_logpmf, # component-wise logpmf for mixture models
concentration, # the concentration parameter
convolve, # convolve distributions of the same type
dim, # sample dimension of multivariate distribution
Expand Down Expand Up @@ -199,6 +214,8 @@ export
loglikelihood, # log probability of array of IID draws
logpdf, # log probability density
logpdf!, # evaluate log pdf to provided storage
logpmf, # log probability mass
logpmf!, # evaluate log pmf to provided storage

invscale, # Inverse scale parameter
sqmahal, # squared Mahalanobis distance to Gaussian center
Expand All @@ -221,7 +238,8 @@ export
params, # get the tuple of parameters
params!, # provide storage space to calculate the tuple of parameters for a multivariate distribution like mvlognormal
partype, # returns a type large enough to hold all of a distribution's parameters' element types
pdf, # probability density function (ContinuousDistribution)
pdf, # probability density function (non-CountableSupport)
pmf, # probability mass function (non-ContinuousSupport)
probs, # Get the vector of probabilities
probval, # The pdf/pmf value for a uniform distribution
quantile, # inverse of cdf (defined for p in (0,1))
Expand Down Expand Up @@ -277,6 +295,7 @@ include("qq.jl")
include("estimators.jl")

# mixture distributions (TODO: moveout)
include("mixtures/mixturedist.jl")
include("mixtures/mixturemodel.jl")
include("mixtures/unigmm.jl")

Expand All @@ -290,6 +309,7 @@ API overview (major features):
- `d = Dist(parameters...)` creates a distribution instance `d` for some distribution `Dist` (see choices below) with the specified `parameters`
- `rand(d, sz)` samples from the distribution
- `pdf(d, x)` and `logpdf(d, x)` compute the probability density or log-probability density of `d` at `x`
- `pmf(d, x)` and `logpmf(d, x)` compute the probability mass or log-probability mass of `d` at `x`
- `cdf(d, x)` and `ccdf(d, x)` compute the (complementary) cumulative distribution function at `x`
- `quantile(d, p)` is the inverse `cdf` (see also `cquantile`)
- `mean(d)`, `var(d)`, `std(d)`, `skewness(d)`, `kurtosis(d)` compute moments of `d`
Expand All @@ -303,23 +323,26 @@ information.
Supported distributions:

Arcsine, Bernoulli, Beta, BetaBinomial, BetaPrime, Binomial, Biweight,
Categorical, Cauchy, Chi, Chisq, Cosine, DiagNormal, DiagNormalCanon,
Categorical, Cauchy, Chi, Chisq, CompoundDistribution,
Cosine, DiagNormal, DiagNormalCanon,
Dirichlet, DiscreteUniform, DoubleExponential, EdgeworthMean,
EdgeworthSum, EdgeworthZ, Erlang,
Epanechnikov, Exponential, FDist, FisherNoncentralHypergeometric,
Frechet, FullNormal, FullNormalCanon, Gamma, GeneralizedPareto,
GeneralizedExtremeValue, Geometric, Gumbel, Hypergeometric,
GeneralizedExtremeValue, Geometric, Gumbel, Hurdle, Hypergeometric,
InverseWishart, InverseGamma, InverseGaussian, IsoNormal,
IsoNormalCanon, Kolmogorov, KSDist, KSOneSided, Laplace, Levy,
Logistic, LogNormal, MatrixBeta, MatrixFDist, MatrixNormal, MatrixTDist, MixtureModel,
Multinomial, MultivariateNormal, MvLogNormal, MvNormal, MvNormalCanon,
MvNormalKnownCov, MvTDist, NegativeBinomial, NoncentralBeta, NoncentralChisq,
Logistic, LogNormal, MatrixBeta, MatrixFDist, MatrixNormal,
MatrixTDist, MixtureModel, Multinomial, MultivariateNormal, MvLogNormal,
MvNormal, MvNormalCanon, MvNormalKnownCov, MvTDist, NegativeBinomial,
NoncentralBeta, NoncentralChisq,
NoncentralF, NoncentralHypergeometric, NoncentralT, Normal, NormalCanon,
NormalInverseGaussian, Pareto, PGeneralizedGaussian, Poisson, PoissonBinomial,
QQPair, Rayleigh, Skellam, StudentizedRange, SymTriangularDist, TDist, TriangularDist,
NormalInverseGaussian, Pareto, PGeneralizedGaussian,
Poisson, PoissonBinomial, QQPair, Rayleigh, Skellam, SpikeSlab,
StudentizedRange, SymTriangularDist, TDist, TriangularDist,
Triweight, Truncated, TruncatedNormal, Uniform, UnivariateGMM,
VonMises, VonMisesFisher, WalleniusNoncentralHypergeometric, Weibull,
Wishart, ZeroMeanIsoNormal, ZeroMeanIsoNormalCanon,
Wishart, ZeroInflated, ZeroMeanIsoNormal, ZeroMeanIsoNormalCanon,
ZeroMeanDiagNormal, ZeroMeanDiagNormalCanon, ZeroMeanFullNormal,
ZeroMeanFullNormalCanon

Expand Down
80 changes: 58 additions & 22 deletions src/common.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,21 @@ struct Matrixvariate <: VariateForm end
`S <: ValueSupport` specifies the support of sample elements,
either discrete or continuous.
"""
abstract type ValueSupport end
struct Discrete <: ValueSupport end
struct Continuous <: ValueSupport end
abstract type ValueSupport{N} end
struct ContinuousSupport{N <: Number} <: ValueSupport{N} end
abstract type CountableSupport{C} <: ValueSupport{C} end
struct ContiguousSupport{C <: Integer} <: CountableSupport{C} end
struct UnionSupport{N1, N2,
S1 <: ValueSupport{N1},
S2 <: ValueSupport{N2}} <:
ValueSupport{Union{N1, N2}} end

const Discrete = ContiguousSupport{Int}
const Continuous = ContinuousSupport{Float64}
const DiscontinuousSupport{I, F} =
UnionSupport{I, F, <: CountableSupport{I},
ContinuousSupport{F}} where {I <: Number, F <: Number}
const Discontinuous = DiscontinuousSupport{Int, Float64}

## Sampleable

Expand Down Expand Up @@ -50,13 +62,14 @@ Base.size(s::Sampleable{Multivariate}) = (length(s),)

"""
eltype(s::Sampleable)
eltype(::ValueSupport)

The default element type of a sample. This is the type of elements of the samples generated
by the `rand` method. However, one can provide an array of different element types to
store the samples using `rand!`.
"""
Base.eltype(s::Sampleable{F,Discrete}) where {F} = Int
Base.eltype(s::Sampleable{F,Continuous}) where {F} = Float64
Base.eltype(::Sampleable{F, <: ValueSupport{N}}) where {F, N} = N
Base.eltype(::ValueSupport{N}) where {N} = N

"""
nsamples(s::Sampleable)
Expand All @@ -67,10 +80,11 @@ into an array, depending on the variate form.
nsamples(t::Type{Sampleable}, x::Any)
nsamples(::Type{D}, x::Number) where {D<:Sampleable{Univariate}} = 1
nsamples(::Type{D}, x::AbstractArray) where {D<:Sampleable{Univariate}} = length(x)
nsamples(::Type{D}, x::AbstractVector) where {D<:Sampleable{Multivariate}} = 1
nsamples(::Type{D}, x::AbstractArray{<:AbstractVector}) where {D<:Sampleable{Multivariate}} = length(x)
nsamples(::Type{D}, x::AbstractVector{<:Number}) where {D<:Sampleable{Multivariate}} = 1
nsamples(::Type{D}, x::AbstractMatrix) where {D<:Sampleable{Multivariate}} = size(x, 2)
nsamples(::Type{D}, x::Number) where {D<:Sampleable{Matrixvariate}} = 1
nsamples(::Type{D}, x::Array{Matrix{T}}) where {D<:Sampleable{Matrixvariate},T<:Number} = length(x)
nsamples(::Type{D}, x::AbstractMatrix{<:Number}) where {D<:Sampleable{Matrixvariate}} = 1
nsamples(::Type{D}, x::AbstractArray{<:AbstractMatrix{T}}) where {D<:Sampleable{Matrixvariate},T<:Number} = length(x)

"""
Distribution{F<:VariateForm,S<:ValueSupport} <: Sampleable{F,S}
Expand All @@ -85,23 +99,45 @@ abstract type Distribution{F<:VariateForm,S<:ValueSupport} <: Sampleable{F,S} en
const UnivariateDistribution{S<:ValueSupport} = Distribution{Univariate,S}
const MultivariateDistribution{S<:ValueSupport} = Distribution{Multivariate,S}
const MatrixDistribution{S<:ValueSupport} = Distribution{Matrixvariate,S}
const NonMatrixDistribution = Union{UnivariateDistribution, MultivariateDistribution}

const DiscreteDistribution{F<:VariateForm} = Distribution{F,Discrete}
const CountableDistribution{F<:VariateForm,
C<:CountableSupport} = Distribution{F,C}
const DiscreteDistribution{F<:VariateForm} = CountableDistribution{F,Discrete}
const ContinuousDistribution{F<:VariateForm} = Distribution{F,Continuous}

const DiscreteUnivariateDistribution = Distribution{Univariate, Discrete}
const ContinuousUnivariateDistribution = Distribution{Univariate, Continuous}
const DiscreteMultivariateDistribution = Distribution{Multivariate, Discrete}
const ContinuousMultivariateDistribution = Distribution{Multivariate, Continuous}
const DiscreteMatrixDistribution = Distribution{Matrixvariate, Discrete}
const ContinuousMatrixDistribution = Distribution{Matrixvariate, Continuous}

variate_form(::Type{Distribution{VF,VS}}) where {VF<:VariateForm,VS<:ValueSupport} = VF
variate_form(::Type{T}) where {T<:Distribution} = variate_form(supertype(T))

value_support(::Type{Distribution{VF,VS}}) where {VF<:VariateForm,VS<:ValueSupport} = VS
value_support(::Type{T}) where {T<:Distribution} = value_support(supertype(T))
const CountableUnivariateDistribution{C<:CountableSupport} =
UnivariateDistribution{C}
const DiscreteUnivariateDistribution =
CountableUnivariateDistribution{Discrete}
const ContinuousUnivariateDistribution = UnivariateDistribution{Continuous}
const CountableMultivariateDistribution{C<:CountableSupport} =
MultivariateDistribution{C}
const DiscreteMultivariateDistribution =
CountableMultivariateDistribution{Discrete}
const ContinuousMultivariateDistribution = MultivariateDistribution{Continuous}

const CountableMatrixDistribution{C<:CountableSupport} = MatrixDistribution{C}
const DiscreteMatrixDistribution = CountableMatrixDistribution{Discrete}
const ContinuousMatrixDistribution = MatrixDistribution{Continuous}

pdf(d::CountableDistribution) = pmf(d)
pdf(d::CountableDistribution, x) = pmf(d, x)
logpdf(d::CountableDistribution) = logpmf(d)
logpdf(d::CountableDistribution, x) = logpmf(d, x)

const CountableMultivariateDistribution{C<:CountableSupport} =
MultivariateDistribution{C}
const DiscreteMultivariateDistribution =
CountableMultivariateDistribution{Discrete}
const ContinuousMultivariateDistribution = MultivariateDistribution{Continuous}

const CountableMatrixDistribution{C<:CountableSupport} = MatrixDistribution{C}
const DiscreteMatrixDistribution = CountableMatrixDistribution{Discrete}
const ContinuousMatrixDistribution = MatrixDistribution{Continuous}


variate_form(::Type{<:Sampleable{VF, <:ValueSupport}}) where {VF<:VariateForm} = VF
value_support(::Type{<:Sampleable{<:VariateForm,VS}}) where {VS<:ValueSupport} = VS

# allow broadcasting over distribution objects
# to be decided: how to handle multivariate/matrixvariate distributions?
Expand Down
8 changes: 4 additions & 4 deletions src/edgeworth.jl
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,15 @@ end


# Cornish-Fisher expansion.
function quantile(d::EdgeworthZ, p::Float64)
function quantile(d::EdgeworthZ, p::Real)
s = skewness(d)
k = kurtosis(d)
z = quantile(Normal(0,1),p)
z2 = z*z
z + s*(z2-1)/6.0 + k*z*(z2-3)/24.0 - s*s/36.0*z*(2.0*z2-5.0)
end

function cquantile(d::EdgeworthZ, p::Float64)
function cquantile(d::EdgeworthZ, p::Real)
s = skewness(d)
k = kurtosis(d)
z = cquantile(Normal(0,1),p)
Expand Down Expand Up @@ -112,5 +112,5 @@ cdf(d::EdgeworthAbstract, x::Float64) = cdf(EdgeworthZ(d.dist,d.n), (x-mean(d))/

ccdf(d::EdgeworthAbstract, x::Float64) = ccdf(EdgeworthZ(d.dist,d.n), (x-mean(d))/std(d))

quantile(d::EdgeworthAbstract, p::Float64) = mean(d) + std(d)*quantile(EdgeworthZ(d.dist,d.n), p)
cquantile(d::EdgeworthAbstract, p::Float64) = mean(d) + std(d)*cquantile(EdgeworthZ(d.dist,d.n), p)
quantile(d::EdgeworthAbstract, p::Real) = mean(d) + std(d)*quantile(EdgeworthZ(d.dist,d.n), p)
cquantile(d::EdgeworthAbstract, p::Real) = mean(d) + std(d)*cquantile(EdgeworthZ(d.dist,d.n), p)
6 changes: 6 additions & 0 deletions src/functionals.jl
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ function expectation(distr::DiscreteUnivariateDistribution, g::Function, epsilon
sum(x -> f(x)*g(x), leftEnd:rightEnd)
end

function expectation(distr::CountableUnivariateDistribution,
g::Function, epsilon::Real)
f = x->pdf(distr,x)
sum(x -> f(x)*g(x), support(distr))
end

function expectation(distr::UnivariateDistribution, g::Function)
expectation(distr, g, 1e-10)
end
Expand Down
Loading