Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: First take on cleaning up EmpiricalUnivariateDistribution #661

Closed
wants to merge 1 commit into from

Conversation

andreasnoack
Copy link
Member

@andreasnoack andreasnoack commented Sep 2, 2017

This one was pretty broken. I think I'll change the meanpdf function to utilize that the vector is sorted.

@juliohm
Copy link
Contributor

juliohm commented Sep 2, 2017

Maybe parametrize the distribution with T<:Real instead of explicitly using Float64?

@andreasnoack andreasnoack changed the title WIP: First take on cleaning up EmpiricalUnivariateDistribution First take on cleaning up EmpiricalUnivariateDistribution Sep 2, 2017
@andreasnoack andreasnoack changed the title First take on cleaning up EmpiricalUnivariateDistribution RFC: First take on cleaning up EmpiricalUnivariateDistribution Sep 2, 2017
@andreasnoack
Copy link
Member Author

Hm. It looks like we are assuming that DiscreteUnivariateDistributions can only be integer valued so, in general, an empirical distribution won't be a DiscreteUnivariateDistribution. Seems overly restrictive to me.

I'd also really like to rename this distribution to Empirical and since it was so broken, I considering if it can be done without deprecating it. Objections?

@juliohm
Copy link
Contributor

juliohm commented Sep 14, 2017

I am in favor of new names, but ideally they would take into account continuous vs. discrete cases. I also agree that the implementation was quite broken to deserve a deprecation warning.

I am particularly interested in the continuous case as discussed here: https://discourse.julialang.org/t/empirical-distribution-type-for-continuous-variables/5676

In one of my spatial stats solvers I need to transform continuous empirical distributions to Normal, do the calculations, and transform back to the initial empirical distribution.

In that post, I wrote a simple transformation function that I am pasting here again:

"""
    transform(x, dist)

Transform the empirical distribution of samples `x` into `dist`.
"""
function transform(x::Vector, dist::ContinuousUnivariateDistribution)
    idx = sortperm(x)
    
    N = length(x)
    xcdf = [1:N-1; .99N] / N
   
    quantile(dist, xcdf[idx])
end

This function could be incorporated in Distributions.jl as well, it only depends on a Distribution type for which quantile is implemented.

@matbesancon
Copy link
Member

@andreasnoack there seem to be conflicts to solve. Afterwards what's the status for this PR?

@andreasnoack
Copy link
Member Author

andreasnoack commented Sep 20, 2018

It stalled when I realized that it is implicitly assumed that Discrete here means $\mathbb{N}_0$ which therefore doesn't cover the empirical distribution. I then thought about reworking the whole declared type tree in Distributions which I thought was (and still think is) both insufficient and unnecessarily complex but then I realized that I wouldn't have time for that. However, I still dream about redesigning the abstract types (and more out multivariate and matrix valued distributions.)

@matbesancon
Copy link
Member

Agreed, this joins other issues, like #771
This is likely to need a complete re-work when we get the type hierarchy right.
I came to the same conclusion that we need discrete distributions as in over a discrete set when working on MixedDistributions:
https://github.com/matbesancon/MixedDistributions.jl/blob/master/src/simple_discrete.jl

@matbesancon
Copy link
Member

Should we submit an issue documenting what the intent is and the requirements on the type hierarchy, then close this PR till the required changes are carried out?

@andreasnoack
Copy link
Member Author

I think we keep this one open as it should be fine provided we get an appropriate distribution supertype for the distribution. I think #681 might cover, at least some of, what is needed.

@matbesancon matbesancon added the wont-merge PR will not be merged, kept for archiving purpose label Feb 10, 2019
@andreasnoack andreasnoack deleted the anj/empirical branch September 4, 2020 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wont-merge PR will not be merged, kept for archiving purpose
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants