Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Categorical to have different bounds then 1, ncategories? #449

Closed
jw3126 opened this issue Jan 21, 2016 · 8 comments
Closed

Allow Categorical to have different bounds then 1, ncategories? #449

jw3126 opened this issue Jan 21, 2016 · 8 comments

Comments

@jw3126
Copy link
Contributor

jw3126 commented Jan 21, 2016

How about adding an additional field to the Categorical type with default value 1. What I have in mind is a field Categorical.min such that

  • minimum(c) = c.min
  • maximum(c) = c.min + ncategories(c) - 1
  • samples are drawn from [minimum(c): maximum(c)] and not [1:ncategories(c)]

There would probably be several benefits, let me just describe the one that motivates me.
What I want to do is create new distributions out of old ones. For example

  • The distribution of the minimum, maximum of n draws from the same or several Distributions
  • The distribution of the sum/difference of draws from two or more distributions
  • etc.

In some cases (e.g. add two binomial same p) one can do so analytically, but more often then not I end up with a distribution which has no better description then a value range and a probability vector.
But this distribution is may not be Categorical because it can assume 0 or even negative values!
I feel it would be awkward to introduce a new type for this kind of thing and would love to use Categorical instead.

@johnmyleswhite
Copy link
Member

This is pretty much exactly the use case for location-scale families, which people are already working on.

@jw3126
Copy link
Contributor Author

jw3126 commented Jan 22, 2016

Ah thanks I see. For what I want to do, I would prefer a discrete version of UnivariateLocationScaleFamily. E.g. if the math preserves discreteness (sum, max...) the code should also. Also iterating constructions is much cleaner if the type does not forget discreteness along the way.
Would it be reasonable to also have a discrete UnivariateLocationScaleFamily or does such a thing even already exist somewhere?

@johnmyleswhite
Copy link
Member

Yes, it's totally reasonable. I think one of the essential things we have to get right is making location and scale families respect discreteness when appropriate.

@jw3126
Copy link
Contributor Author

jw3126 commented Jan 23, 2016

So we should actually have two types

DiscreteUnivariateLocationScaleFamily {T <: UnivariateDistribution} <: DiscreteUnivariateDistribution
ContinuousUnivariateLocationScaleFamily{T <: UnivariateDistribution} <: ContinuousUnivariateDistribution

(maybe the first type is parametrized by discrete T only.) And respect discreteness when it can be proven by type reasoning? e.g.

Binomial + 1 discrete (because discrete + int is always discrete)
Binomial + 1.0 continuous (because discrete + float might be continuous)
Normal * 0 continuous (because continuous * int might be continuous)

@adityam
Copy link

adityam commented Apr 27, 2016

Another option is to pass two parameters to Categorical: probabilities and values (and have a constructor that takes only one parameter: probabilities and initializes values to [1:length(probabilities)]. For example:

d = Categorical([0.2, 0.3, 0.5], [-5,0,5])

will generate -5 with probability 0.2, 0 with probability 0.3, and 5 with probability 0.5.

@cstjean
Copy link

cstjean commented Nov 1, 2016

d = Categorical([0.2, 0.3, 0.5], [-5,0,5])

+1; I've got an application that would work great with Categorial([0.2, 0.3, 0.5], ["apple", "orange", "kiwi"]). Would such a PR be considered? Obviously, the mean, variance, etc. are not computable on such non-numerical distributions. This would solve #147 if I understand correctly.

@andreasnoack
Copy link
Member

Related to #634

@matbesancon
Copy link
Member

I would consider this closed with #634 then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants