-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Empirical() distribution? #98
Comments
Hmmm. This is interesting. Morally I feel like the Empirical distribution is a discrete distribution over the observed data with density |
That was also my first impulse. Reto convinced me that for the data he was interested in, an aaproximated density would be more useful in practice. But given your reaction I would say we should make the discrete behavior the default. The only question remains if we allow histogram/kernel density to be available as alternatives "for convenience". Would you be open to that? |
I'm fairly opposed as currently implemented, because the class structure does not map on to the structure of the statistical objects. I think the right way to do this is to have In practice, I don't know a convenient way to implement a bunch of parametric |
Question
For some of our practical meteorological applications we need to deal with probability distributions and probabilistic forecasts that are made via empirical distributions (so-called "ensembles" in the weather forecasting). So to enable all the nice infrastructure from
distributions3
andtopmodels
we (= mostly @retostauffer with some input from me) have written a first draft of adistribution
and correspondingd
/p
/q
/r
functions.My feeling is that this distribution class is of general interest and could also be relevant in introductory statistics. So should we prepare a PR for
distributions3
?Implementation strategy
The idea is that the empirical observations are handled internally like the parameters of a probability distribution. The moments and quantiles are computed from the empirical observations directly. The
cdf()
uses the empirical CDF (ECDF) and thepdf()
can either usehist()
ordensity()
.Illustration
Packages: The current code is in
topmodels
but unexported at the moment.Set up a simple empirical sample and a shifted version with some extra observations to obtain another empirical sample:
Moments and quantiles are straightforward:
Random sampling is with replacement (i.e., corresponding to bootstrapping):
The ECDF is uniquely defined but the PDF via histogram or kernel density depends on binning/bandwidth:
The text was updated successfully, but these errors were encountered: