Traits in `statistics::traits` should not require return type be `Option` #295

YeungOnion · 2024-09-18T02:39:44Z

Many distributions will have some summary statistics regardless of their parameters so Option<T> should not be required, e.g. it is not semantically accurate to require unwrapping the mean of a binomial. Regardless of the sample probability, a valid Binomial type will always have some mean, but student's distribution require at least one dof, cauchy has too much tail and has no mean is never valid.

proposed solution

In lieu of generics, we can use associated types for the statistics::traits::Distribution trait¹. I don't believe this would bound us in terms of generic numerics in the future either, as long as the associated type could support the generic as well, i.e.

impl<T: Float> Distribution for DistributionType {
    type Mu = T;   // or Vector<T, Dimension, ...>
    type Var = T;  // or TensorCube<T, Const<2>, Dimension, ...>
    type Skew = T; // or TensorCube<T, Const<3>, Dimension, ...>
    type Kurt = T; // or TensorCube<T, Const<4>, Dimension, ...>
}

collapse MeanN and VarianceN into statistics::traits::Distribution and add Covariance and Entropy traits
- Add MultivariateNormalDiag distribution. #208 proposed a way to specify Covariance
- rely on unimplemented!() over None when more correct. Defaulting to None when we don't have a simple expression isn't correct (may also be a little more motivating for someone to implement it over panicking).
- would have to implement Covariance on matrices, Cholesky decompositions, vectors, and likely floats.
types for skewness and kurtosis will be higher order tensors than nalgebra can accomodate. so the signature could have return type of never ! before being plain ol' unimplemented!() before being implemented.

Sample code of the traits I propose adding.

pub trait Moments<T: Float> {
    type Mu;
    type Var: Covariance<T>;
    type Kurt;
    type Skew;
    fn mean(&self) -> Self::Mu;
    fn variance(&self) -> Self::Var;
    fn std_dev(&self) -> <Self::Var as Covariance<T>>::V;
    fn excess_kurtosis(&self) -> Self::Kurt;
    fn skewness(&self) -> Self::Skew;
}

pub trait Covariance<T: Float> {
    type M;
    type V;
    fn dense(&self) -> Self::M;
    fn sparse(&self) -> Self::V;
    fn forward(&self, other: Self::V) -> Self::V;
    fn inverse(&self, other: Self::V) -> Self::V;
    fn determinant(&self) -> T;
}

pub trait Entropy<T: Float> {
    fn entropy(&self) -> T;
}

I also think renaming it could be helpful, unsure what is better, but since most of them are moments, or central moments, we could make entropy it's own, as entropy is always scalar and have a Moments trait. I think much of this is motivated from reference implementations in Math.NET's interfaces since that's how this project started. ↩

The text was updated successfully, but these errors were encountered:

YeungOnion · 2024-09-18T02:47:32Z

Also going to say that I'm not tied to this implementation, but I align well to the issue title. I don't think it makes sense to return Option for all of them.

FreezyLemon · 2024-09-20T06:56:15Z

I think your proposal is good. It should be a clear benefit to avoid returning Option if it's statically known to not be needed. Some unordered thoughts:

What is the return type of e.g. fn mean for a distribution which has no defined mean. Maybe a unit type ()? Or should these be moved outside of trait Distribution and only be implemented if a distribution supports them?
rely on unimplemented!() over None when more correct

This is great, maybe remove the default implementations of mean etc.¹ while you're at it. It's not immediately clear if a missing implementation of mean points to it being undefined or not implemented yet. That should probably be handled by the implementor.²
There's a possibility that the new traits don't even need a T: Float trait bound
why is trait Distribution in module statrs::statistics and not statrs::distribution again?

Excluding std_dev ↩
Another thing: The default implementations have function documentation following the pattern "Returns X, if it exists". But we know that all of these return None unless overridden by a manual implementation (which should have its own, overriding docs) so this text almost always actually means "Returns None" ↩

YeungOnion · 2024-09-26T17:54:01Z

I hadn't thought of this one, but the API I designed doesn't work well for fallible types,

I can't impl<T> Covariance<Option<T>> for Option<T> separately from impl<T> Covariance<T> for T because of the standard deviation¹. Standard deviation could be its own trait or simply implemented on types where it is useful², and then I could drop the "note to implementors" on std_dev method, quoted what I came up with below,

Note for implementors

Associated types should capture semantics of the distribution, e.g.
[Option] should be used where moments is defined or undefined based on parameter values.
[()] is used for moments that are never defined

I don't think it would be good semantics to do so, even if it is more ergonomic, calling covariance methods in map on Option<T: Covariance> types would be better than Option<f64>::default().forward(1). ↩
at the moment, this is my preference. ↩

FreezyLemon · 2024-09-26T18:54:23Z

I can't impl<T> Covariance<Option<T>> for Option<T> separately from impl<T> Covariance<T> for T because of the standard deviation.

I'm not sure I understand this point.. can you elaborate or maybe give an example? It seems possible but I'm not sure I understand your proposal correctly.

EDIT: I just saw #304. I'll take a look and see if I can't figure it out

YeungOnion · 2024-09-27T05:22:28Z

To couple the types for the return of variance and std_dev I have
Type Var: Covariance and the return type of std_dev is <Var as Covariance>::V

Perhaps I should be using the return of variance as the type from dense, Covariance::M, but the other option is not constraining a type relationship between variance and std_dev by removing std_dev or adding another associated type (which I think is unnecessarily flexible).

EDIT: on mobile, but within 24h I can push the WIP commit. It won't compile, but it's where I first noticed it; was while implementing moments for the F-distribution.

YeungOnion mentioned this issue Sep 24, 2024

0.18 release #292

Open

YeungOnion added this to the 0.18 milestone Sep 24, 2024

YeungOnion mentioned this issue Sep 25, 2024

Efficient access to private data #244

Open

YeungOnion modified the milestones: 0.18, Future, 0.19 Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Traits in `statistics::traits` should not require return type be `Option` #295

Traits in `statistics::traits` should not require return type be `Option` #295

YeungOnion commented Sep 18, 2024

YeungOnion commented Sep 18, 2024

FreezyLemon commented Sep 20, 2024

YeungOnion commented Sep 26, 2024

Note for implementors

FreezyLemon commented Sep 26, 2024 •

edited

Loading

YeungOnion commented Sep 27, 2024 •

edited

Loading

Traits in statistics::traits should not require return type be Option #295

Traits in statistics::traits should not require return type be Option #295

Comments

YeungOnion commented Sep 18, 2024

proposed solution

Footnotes

YeungOnion commented Sep 18, 2024

FreezyLemon commented Sep 20, 2024

Footnotes

YeungOnion commented Sep 26, 2024

Note for implementors

Footnotes

FreezyLemon commented Sep 26, 2024 • edited Loading

YeungOnion commented Sep 27, 2024 • edited Loading

Traits in `statistics::traits` should not require return type be `Option` #295

Traits in `statistics::traits` should not require return type be `Option` #295

FreezyLemon commented Sep 26, 2024 •

edited

Loading

YeungOnion commented Sep 27, 2024 •

edited

Loading