-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove degenerate cases #102
Comments
I think we'd have to evaluate this on a case-by-case basis for distributions but I agree I think at least for the Gamma distribution it does not necessarily make sense to allow INF for |
FYI, we've recently had someone interested in supporting a particular degenerate case of Exp. |
The use case referenced to justify degenerate parameter values is likely erroneous, an absorbing state can be modelled by having all outgoing paths loop back to the same state. In that case any distribution would work, unless one is interested in statistics about how many times a transition, including transitions to the same state, occurred. I disagree with the proposed solution and agree with the Wikipedia page that the case is degenerate for a reason.
Additionally to these complications, as stated above, either cdf can exist but pdf be a non-classical function as is the case with a discrete probability or both limits exist but aren't cdf's and pdf's anymore. If degenerate cases are allowed where not all the functions exist in a classical sense there's an argument to be made that returning errors is more helpful than approximating a delta function with p(x0) = infinity and for all other x p(x) = 0 |
Perhaps I got lost in the mathematics on this one, I take some of that back. If there's only one parameter apart from the variable of interest then I agree, More general Exp is a reasonable extension. In the case of the gamma function however there are two parameters, so knowing in which order to take the limit doesn't have an equally reasonable default. |
Thanks for the analysis and follow up comment. I agree: the exp case seems unambiguous, but in general that is not the case. It may be that allowing for degenerates in only one variable is also a sensible option (but I haven't thought enough about the gamma case). |
Author of the More general Exp issue here.
Exactly!! This is what I had in mind. In the case of In the other hand, a Gamma with both parameters infinity has no intuitive behaviour.
I totally agree with that, but not only because there can be an intuitive answer for some cases but also because there can be useful (and arbitrary) implementations. Just as an example: |
As I outlined with the mathematical infinities that get confused with IEEE754 infinities, the semantics are not identical. This starts to get unintuitive because |
Yes, but you missed the bit about convergence: |
Based on the discussion my current proposal would be
I think an error is preferable to NaN on the last point, but this is at least not immediately desirable due to to being backwards-incompatible. |
I agree with these principles and would add the following:
I am not sure the consequences of following these premises, but could start with something:
Of course, the list can be longer and implementing all these changes might not be what we want(?) |
I agree, caveats below, especially since errors are returned when constructing the distributions, it's an immediate feedback on what to expect.
This is at odds with the solution in the rand crate. Finite mean and infinity std_dev do not make the distribution converge towards any other, but neither did the exponential distribution. This would deny extending some properties of the distribution in a meaningful way, like the mean and the std_dev, which do have sensible limits. More general Exp does not care whether the distribution converges, only if one selected property can be extended consistently. I would even argue that using the logic used to extend the exponential clock one would conclude that the limit of sampling of a normal distribution with mean 0 and infinite variance is infinity with probability 1/2 and -infinity with probability 1/2, so does have a limit.
Strongly disagree. We should model the established mathematical practices as closely as possible. Convergence of random variables have formal methodologies for characterising random variables are degenerate cases of each other. I thus reverse my reversal on the issue and think more general Exp is a bad idea due to selectively extending properties derived from a distribution that cannot formally be shown to converge against any distribution. It breaks mathematics in subtle ways. The principled approach would be to pick one of the convergence metrics and extend to degenerate cases if such a limit in the chosen metric exists, and deny deriving any properties from inexistent distribution limits. Edit: big rewrite for clarification purposes |
Yes, indeed! and any function that can fail should return a
Exactly, that is why I proposed to not accept this case either:
Convergence
Maybe it was not so explicit, but the convergence we considered to solve More general Exp was "the intuitive behaviour a user would expect", which I think is still the correct thing to do. To translate into a mathematical convergence one needs a sequence of random variables. Sadly, there is no "good sequence" to consider. Let's say we want to claim something like " Intuitive convergenceIf intuitive convergence is not almost sure convergence, what should it be? I think that, if there is a characterization of Usually, such a characterization is the one used when simulating the variable, so in the context of the rand crate is easy to get an agreement. Maybe, if such a characterization is also part of the documentation in statrs, then the expected behaviour will also be "intuitive". Practicality
This is purely from a practical point of view. Of course one would like the crate to be as complete as possible. AppendixConsider a sequence of variables Proof: Case 1: not convergent. Consider the sequence Case 2: convergent. Consider a sequence of Exponential variables Define the event |
This is still divergence, so I don't think it's valid to say that it has a limit.
In practice, many things don't get implemented until requested. But if we can tackle degenerate cases comprehensively, that would be great. (This also implies documenting why certain degenerate cases should not return a solution.)
Example? The distribution is only (originally) defined for on positive parameters. Ah, this is "Case 1: not convergent." This is an (informal) proof that for any It should be clear that for any fixed |
Yes, that was exactly what I was trying to say. I agree that it does not converge almost surely (although I disagree with the proof), which is why there's a mismatch between rand and statrs. It seems to me we can agree that the returning of errors upon calling It boils down to two questions:
I'm aware this is not a formal argument. I was using the intuitive version of convergence to sample how far one is willing to take this.
I'm not sure where this is heading. Are you suggesting to implement different library functions for |
Actually, yes, but not now. It's also besides the point. The point is that for any situation using one of these libs, there is some limit on precision and some limit on the number of samples. It doesn't matter what those limits are, only that they exist; given that, the argument above is that for I haven't really considered other distributions, yet. For The above analysis is to say that however we choose to solve this problem, (a) case-by-case analysis will be needed and (b) one cannot consider this a pure-maths problem. Perhaps we should start by asking questions like:
|
The issue with this approach is that we are not dealing with real numbers, we are dealing with floating points (or finite precision numbers), which include infinity. Therefore, one should not focus so much in the math here, and more in the actual numbers on the computer. INF is useful, for example, to convey: "precision limit overflow". Returning NaN would express too little of what is actually happening and, in some situations, +INF or -INF can be used by the user.
Totally agree.
Well, you know my answer. It boils down to "it can be more useful to an user than an error", especially since INF can be already obtained (recall finite precision), as @dhardy pointed out.
I think it would be strange for a user to get a NaN value. If I get NaN while doing computations, it is because either (1) I went beyond the precision, or (2) some computation can actually fail and I should replace it with another. For me, NaNs are like errors, and we should try to document them if there is a possibility of returning them. |
If evaluating an expression in infinite precision and range would yield a value that should not overflow/NaN when the result is expressed in fp, for example (f64::MAX + 1.0) - 1.0, yet the fp calculation yields overflow/NaN, then this is due to intermediate rounding/overflow errors propagating and it is the job of numerical analysis to find an expression that can be accurately evaluated in fp. There's no disagreement in that. OP was about what to do with point-blank degenerate inputs and to figure out what the sensible result should be, given the expressed interest in expanding the domain of the methods defined in statrs. Established practice in this library is to take the limit whenever mathematical functions have to be evaluated at degenerate values, which is why every pdf evaluates to 0 at inf. Otherwise not taking mathematical limits would lead to NaN values, like with the pdf of gamma:
This conversation has switched rather abruptly from "what to do with degenerate inputs" to "what inputs to consider degenerate" and "what are reasonable outputs". My answers to your questions are
Questions of my own are
@rasa200
I'm afraid I don't understand what you're saying. If I give you exactly inf as an input and ask you to make sense of it, something I cautioned in OP against doing, you've left the place where you have a reasonable expectation of fp giving you anything sensible. Occasionally this might happen, like 1/inf = 0, but often you'll get expressions like inf / inf, inf - inf, 0 * inf, where the crude rules in fp will give NaN because there's no limit for such expressions that's universally true in mathematics. If you evaluate |
It's not my goal — I was merely pointing out other limitations, but as you say, it's a bit off topic.
Where a mathematical limit exists, this is fine. The problem that came up was @rasa200's "proof of non-convergence" of
It should be clear that Of course, one has to be careful; e.g. Wikipedia gives an example with discontinuous CDF which may be more of an issue. This thus goes back to the question of which definition of convergence should be used?
Good question. I don't know. Rand does not implement CDF or PDF and probably won't ever do so, and tends to consider speed quite important (though we still haven't resolved whether to include high-precision alternatives). Even so, I don't believe there is a reason to choose different approaches to degenerate cases. @vks may also have an opinion.
Usually only inputs close to FP limits can cause this. I don't believe we should care very much about this case, at least not when the only error is to generate obvious error/degenerate values like NaN and INF.
I believe the two are equivalent (there are only two infinity values).
That's wrong (doc). In any case, one does not usually check direct equality, except at degenerate values, where this is not an issue. |
You don't think it's awkward to allow returning a pdf without signaling an error, even though this mechanically extended pdf does not have the properties of a pdf, like integrating to 1? What do you think should Gamma(1,inf).max() be? The limits do not commute. Taken as a limit in the almost-sure sense (limz->inf Gamma(1,z)).max() = 0, while limz->inf (Gamma(1,z).max()) = inf. The generalisation of exp in the rand crate deals with the latter limit. I argue the former limit should be used in statrs. I think the edge case needed by @rasa200 can be dealt with by providing a Delta distribution that returns a constant when sampled, and making the constant inf. If the limit of the distribution does not exist in the a.s. sense, statrs should return errors.
There are some non-obvious things going on with IEEE754 which make me uncertain whether this is the case [0]. I think they're not equivalent.
Thanks for pointing that out. Absolute difference is of course rubbish. |
I guess @boxtown should answer that since he's the maintainer of this lib. Yes, it is unexpected that the PDF integrates to 0 and the CDF is never 1, on the other hand it is still possible to implement the API according to its expectations (aside from those implicit ones on the properties of a PDF/CDF). I would never expect numerical methods to be able to accurately quantify the integral of a PDF or inverse a CDF (without also providing bounds on the region of interest, which here obviously don't exist), hence this may not be a problem in practice. I don't know whether this is too awkward. |
I'd like to get the degenerate cases removed and documented. I hadn't noticed this discussion, but I've got an open PR for ensuring that |
Want to get the release out in hopes to get more feedback on what users expect on these cases. |
@FreezyLemon would you say that you removed the apparent degenerate cases when replacing the errors? |
@YeungOnion If you mean #265, I tried not to touch the existing behavior (what params get accepted/rejected), but instead made the error types more specific. #299 is IMHO a case of me explicitly removing a degenerate case from a distribution, but I haven't gone through all of them. So no, not really. |
In some places, like the Gamma function, some effort is made to deal with degenerate
parameters like infinity. I think this is too much effort for a minute use case, as
I don't think users will insert these values to query the library to see what
degenerate distributions look like.
Having to deal with the special cases reduces clarity and complicates the API
considerably, as it forces to contend with mathematical details difficult to convey
in a numerical library. As the limit α -> inf is approached, the pointwise
limit of the pdf and the cdf is zero, so these aren't pdfs and cdfs anymore. As the
limit β -> inf is approached, the continuous distribution degenerates to a discrete
one: P(X = 0) = 1.
I propose to remove all special cases and document why they aren't handled.
This would deal with #57 and #98.
The text was updated successfully, but these errors were encountered: