Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traits in statistics::traits should not require return type be Option #295

Open
YeungOnion opened this issue Sep 18, 2024 · 5 comments
Open
Milestone

Comments

@YeungOnion
Copy link
Contributor

Many distributions will have some summary statistics regardless of their parameters so Option<T> should not be required, e.g. it is not semantically accurate to require unwrapping the mean of a binomial. Regardless of the sample probability, a valid Binomial type will always have some mean, but student's distribution require at least one dof, cauchy has too much tail and has no mean is never valid.

proposed solution

In lieu of generics, we can use associated types for the statistics::traits::Distribution trait1. I don't believe this would bound us in terms of generic numerics in the future either, as long as the associated type could support the generic as well, i.e.

impl<T: Float> Distribution for DistributionType {
    type Mu = T;   // or Vector<T, Dimension, ...>
    type Var = T;  // or TensorCube<T, Const<2>, Dimension, ...>
    type Skew = T; // or TensorCube<T, Const<3>, Dimension, ...>
    type Kurt = T; // or TensorCube<T, Const<4>, Dimension, ...>
}
  • collapse MeanN and VarianceN into statistics::traits::Distribution and add Covariance and Entropy traits
    • Add MultivariateNormalDiag distribution. #208 proposed a way to specify Covariance
    • rely on unimplemented!() over None when more correct. Defaulting to None when we don't have a simple expression isn't correct (may also be a little more motivating for someone to implement it over panicking).
    • would have to implement Covariance on matrices, Cholesky decompositions, vectors, and likely floats.
  • types for skewness and kurtosis will be higher order tensors than nalgebra can accomodate. so the signature could have return type of never ! before being plain ol' unimplemented!() before being implemented.

Sample code of the traits I propose adding.

pub trait Moments<T: Float> {
    type Mu;
    type Var: Covariance<T>;
    type Kurt;
    type Skew;
    fn mean(&self) -> Self::Mu;
    fn variance(&self) -> Self::Var;
    fn std_dev(&self) -> <Self::Var as Covariance<T>>::V;
    fn excess_kurtosis(&self) -> Self::Kurt;
    fn skewness(&self) -> Self::Skew;
}

pub trait Covariance<T: Float> {
    type M;
    type V;
    fn dense(&self) -> Self::M;
    fn sparse(&self) -> Self::V;
    fn forward(&self, other: Self::V) -> Self::V;
    fn inverse(&self, other: Self::V) -> Self::V;
    fn determinant(&self) -> T;
}

pub trait Entropy<T: Float> {
    fn entropy(&self) -> T;
}

Footnotes

  1. I also think renaming it could be helpful, unsure what is better, but since most of them are moments, or central moments, we could make entropy it's own, as entropy is always scalar and have a Moments trait. I think much of this is motivated from reference implementations in Math.NET's interfaces since that's how this project started.

@YeungOnion
Copy link
Contributor Author

Also going to say that I'm not tied to this implementation, but I align well to the issue title. I don't think it makes sense to return Option for all of them.

@FreezyLemon
Copy link
Contributor

I think your proposal is good. It should be a clear benefit to avoid returning Option if it's statically known to not be needed. Some unordered thoughts:

  • What is the return type of e.g. fn mean for a distribution which has no defined mean. Maybe a unit type ()? Or should these be moved outside of trait Distribution and only be implemented if a distribution supports them?

  • rely on unimplemented!() over None when more correct

    This is great, maybe remove the default implementations of mean etc.1 while you're at it. It's not immediately clear if a missing implementation of mean points to it being undefined or not implemented yet. That should probably be handled by the implementor.2

  • There's a possibility that the new traits don't even need a T: Float trait bound

  • why is trait Distribution in module statrs::statistics and not statrs::distribution again?

Footnotes

  1. Excluding std_dev

  2. Another thing: The default implementations have function documentation following the pattern "Returns X, if it exists". But we know that all of these return None unless overridden by a manual implementation (which should have its own, overriding docs) so this text almost always actually means "Returns None"

@YeungOnion
Copy link
Contributor Author

I hadn't thought of this one, but the API I designed doesn't work well for fallible types,

I can't impl<T> Covariance<Option<T>> for Option<T> separately from impl<T> Covariance<T> for T because of the standard deviation1. Standard deviation could be its own trait or simply implemented on types where it is useful2, and then I could drop the "note to implementors" on std_dev method, quoted what I came up with below,

Note for implementors

Associated types should capture semantics of the distribution, e.g.
[Option] should be used where moments is defined or undefined based on parameter values.
[()] is used for moments that are never defined

Footnotes

  1. I don't think it would be good semantics to do so, even if it is more ergonomic, calling covariance methods in map on Option<T: Covariance> types would be better than Option<f64>::default().forward(1).

  2. at the moment, this is my preference.

@FreezyLemon
Copy link
Contributor

FreezyLemon commented Sep 26, 2024

I can't impl<T> Covariance<Option<T>> for Option<T> separately from impl<T> Covariance<T> for T because of the standard deviation.

I'm not sure I understand this point.. can you elaborate or maybe give an example? It seems possible but I'm not sure I understand your proposal correctly.

EDIT: I just saw #304. I'll take a look and see if I can't figure it out

@YeungOnion
Copy link
Contributor Author

YeungOnion commented Sep 27, 2024

To couple the types for the return of variance and std_dev I have
Type Var: Covariance and the return type of std_dev is <Var as Covariance>::V

Perhaps I should be using the return of variance as the type from dense, Covariance::M, but the other option is not constraining a type relationship between variance and std_dev by removing std_dev or adding another associated type (which I think is unnecessarily flexible).

EDIT: on mobile, but within 24h I can push the WIP commit. It won't compile, but it's where I first noticed it; was while implementing moments for the F-distribution.

@YeungOnion YeungOnion modified the milestones: 0.18, Future, 0.19 Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants