Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting the domain/support of objects, and domain type hierarchy #146

Open
oschulz opened this issue Nov 20, 2023 · 61 comments
Open

Getting the domain/support of objects, and domain type hierarchy #146

oschulz opened this issue Nov 20, 2023 · 61 comments

Comments

@oschulz
Copy link

oschulz commented Nov 20, 2023

We currently don't have a common function to query the domain or support of something (a distribution, measure, ML model, etc.) yet, right? We could add something like getdomain(something) and getsupport(some_function)

For functions, the situation is more tricky since their domain/support will often not only depend on the argument type, but also on the actual shape of the argument (array sizes, etc.). We could do getdomain(f, shp::ValueShapes.AbstractValueShape) for the kind of stuff ValueShapes currently covers (could extend that).

Distributions currently offers support, but it's limited to univariate distributions. Having getdomain/getsupport in DomainSets would allow packages to add support for complex types of domains in a clean fashion without depending on DomainSets.

Would a getdomain/getsupport API be welcome in DomainSets (I could do a PR)?

(Edit: Also some discussion about domain/set type hierarchies further down.)

@dlfivefifty
Copy link
Member

For functions, the situation is more tricky since their domain/support will often not only depend on the argument type, but also on the actual shape of the argument (array sizes, etc.)

Can you give a concrete example?

The ones I can think of (eg matrix functions with sqrt(A)) have a very complicated domain and so one would need a special type to represent the domain that could also have variable dimensions cooked in, eg for domain(sqrt, ::Symmetric{Float64,Matrix{Float64}}) would return something like SymmetricSemipositiveDefiniteDomain{Float64}().

@daanhb
Copy link
Member

daanhb commented Nov 20, 2023

Distributions currently offers support, but it's limited to univariate distributions. Having getdomain/getsupport in DomainSets would allow packages to add support for complex types of domains in a clean fashion without depending on DomainSets.

Hmm, I don't quite get one you could extend a function defined in DomainSets without depending on DomainSets (since you say "without depending on"). I'm all for the idea, but how do you see this?

@oschulz
Copy link
Author

oschulz commented Nov 20, 2023

@daanhb : Hmm, I don't quite get one you could extend a function defined in DomainSets without depending on DomainSets (since you say "without depending on"). I'm all for the idea, but how do you see this?

Sorry, I should have said "without a hard dependency in it". I meant via package extensions (falling back to a direct dependency or Requires on Julia <v1.9).

@oschulz
Copy link
Author

oschulz commented Nov 20, 2023

For functions, the situation is more tricky [...]
@dlfivefifty : Can you give a concrete example?

I was thinking about functions like sum, normalize, var and so on, functions that can take lots of different inputs, and are often extended by other packages to even more inputs. And then there's functions like identity that will accept anything ...

For functions like log, exp, relu, etc. it would be very easy to define getdomain(f) and getsupport(f), of course. Maybe limiting ourselves to such cases would be enough for now? Would be helpful to add domains like and ℝ₊ for that.

@dlfivefifty
Copy link
Member

I think you need to include the type in domain and support (Btw I do not like the use of "get" as its not really used in Julia, apart from getindex)

For example, the domain of sqrt depends on whether its real or complex. Eg I'd imagine something like the following (@daanhb why is limited to ComplexF64?):

domain(::typeof(sqrt), ::T) where T<: Real = HalfLine{T}()
domain(::typeof(sqrt), ::T) where T<: Complex =/ NegativeOpenHalfLine{T}()

@oschulz
Copy link
Author

oschulz commented Nov 20, 2023

Btw I do not like the use of "get" as its not really used in Julia, apart from getindex

How about domainof(obj) and supportof(obj) then?

I think you need to include the type [...] For example, the domain of sqrt depends on whether its real or complex.

I had that in my initial draft of this proposal, but then I thought that maybe giving the tpe is too much and too little - for distributions and measures, we don't need it, they have a well-defined and unique domain/support. For sqrt adding the type would help, but for functions that take values that include arrays one would need both type and size (ValueShapes offers a solution here, but it' s currently limited in what it can handle).

So we could, initially, limit ourselves to cases that are clear-cut? Or we could do domainof(obj, T) and allow, e.g. for distributions and measures, something like domainof(obj) = domainof(obj, testvalue(obj))?

@dlfivefifty
Copy link
Member

The analogue of "domain" for arrays is just called axes. What's wrong with just using domain?

@oschulz
Copy link
Author

oschulz commented Nov 20, 2023

What's wrong with just using domain?

I'm not against just domain. But I think we should also have something to get support, and Distributions already "owns" support. Also, I could see people wanting to use domain as a variable name in code that deals with domains, so maybe domainof wouldn't be a bad choice?

@dlfivefifty
Copy link
Member

I think let users use namespaces (DomainSets.domain etc) if they need to avoid confusion. Users also use size, axes, length etc. as names for variables so this isn’t a good argument

I think we could move Distributions.support and domain to a simple package (DomainsCore or something) that just defines these two functions.

@daanhb
Copy link
Member

daanhb commented Nov 20, 2023

Ownership of functions is certainly an issue here. Is there any consensus about using package extensions rather than packages like DomainSetsCore or JuliaApproximationCore? I'm starting to think we could meaningfully do with both (and that support or whatever it's called eventually should go into the latter and not the former.)

@oschulz
Copy link
Author

oschulz commented Nov 20, 2023

I think we could move Distributions.support and domain to a simple package (DomainsCore or something) that just defines these two functions.

In general I like that approach, having lightweight API-only packages, I think it enables us to take full advantage of multiple dispatch across packages. In this specific case though, I'm not sure - Distributions would then actually have to have a hard dependency on DomainSets - it couldn't implement domain(d) via a Pkg extension, since that would lead to unpredictable behaviour: domain(d) could only return a result if DomainSets is loaded, but users could request domain(d) without loading it.

I guess Distributions wouldn't accept DomainSets as a dependency, because it pulls in StaticArrays, which Distributions has avoided so far.

I think let users use namespaces (DomainSets.domain etc) if they need to avoid confusion.

I agree in general, but with major packages like Distributions I think one should try to avoid name clashes.

@dlfivefifty
Copy link
Member

are you worried about type piracy? It’s a pretty standard idiom for packages to implement functionality defined in a Core package. So donain(d) would just error and say “load DomainSets.jl” if it’s not loaded

@oschulz
Copy link
Author

oschulz commented Nov 20, 2023

are you worried about type piracy? It’s a pretty standard idiom for packages to implement functionality defined in a Core package.

Yes, that wouldn't be a problem, since there would be a clear contract between "DomainSetsCore" and DomainSets.

So donain(d) would just error and say “load DomainSets.jl” if it’s not loaded

But it wouldn't always, from the user point of view: not if DomainSets is loaded implicitly via some other dependency path. And if DomainSets should vanish from that (possibly deep) dependency path, then the code that worked before would break, even if upper version bounds are used correctly. I know we have this in some corners of the ecosystem, but I don't think it's a good pattern. We do require code to state all of it's dependencies explicitly in Julia (unlike Python), and I think that has been very beneficial to the stability of the ecosystem.

@oschulz
Copy link
Author

oschulz commented Nov 20, 2023

How about this (if the Distributions maintainers are Ok with it):

  • We add domain and support here, DomainSets owns them.

  • Distributions adds support via a Pkg extension (same for MeasureBase and possibly other packages). Distributions deprecates it's own support immediately, pointing users to DomainSets.support. Distributions.support returns the same as before for the distributions where it works now (basically just univariate dists), the DomainSets.support will return the same for these.

  • At the next breaking release of Distributions (may be a quite a while), Distributions removes it's own support function.

CC @devmotion, I think we should ask you for some input here as well.

@daanhb
Copy link
Member

daanhb commented Nov 27, 2023

It seems Distributions.jl has its own interval type called RealInterval. That could be implemented using just IntervalSets, which is very lightweight.

Somewhere in the near future possibly both IntervalSets and DomainSets will depend on DomainSetsCore, which actually defines a function called domain for now.

@oschulz
Copy link
Author

oschulz commented Nov 27, 2023

Somewhere in the near future possibly both IntervalSets and DomainSets will depend on DomainSetsCore, which actually defines a function called domain for now.

Thanks @daanhb , that's great! Could DomainSetsCore also provide a function support or similar?

This got me thinking, though - just a proposal, I don't mean to be presumptuous: if we change a few things around anyway, should we maybe start dropping the term "domain" when it comes to the type of the sets themselves, in the future? And use terms like "domain" and "support" only in the name of functions that get such sets for objects/functions?

At the moment, we have RealLine <: FixedInterval <: TypedEndpointsInterval <: AbstractInterval <: Domain. But mathematically, the set of real numbers by itself is just a set. It is the domain of some functions, and the support of some functions, and it's the co-domain of some functions. But the real numbers, intervals, etc. are not domains intrinsically, by themselves. It's not what they are, it's a role they play in certain contexts. Maybe we could rename Domain to something like InfiniteSet at the top of the type hierarchy? It could possibly even be a subtype of Base.AbstractSet - after all, in some cases we might want to use a finite Base.Set for domains, supports and co-domains.

@dlfivefifty
Copy link
Member

I like the name InfiniteSet. The problem with InfiniteSet <: AbstractSet is that types in Julia are more about conforming to an interface than mathematical definitions. Will an InfiniteSet actually conform to the rules governing AbstractSet? And these rules are not really written down anywhere...

Note I added support to Infinite arrays in Julia see

https://github.com/JuliaLang/julia/blob/master/test/testhelpers/InfiniteArrays.jl

So I'm all for allowing infinite sets but we just need to be more careful and possibly add a testhelper to ensure the expected functionality is maintained.

@daanhb
Copy link
Member

daanhb commented Nov 27, 2023

Hmm, I did not see a discussion about names coming up here. Points, to name just one example, are not infinite sets. And if I may short-circuit the discussion, inheritance from AbstractSet seems to bring mostly complications without tangible benefits. (The "is a" relationship between objects is not a good enough reason for inheritance, in my experience so far with Julia. I agree with @dlfivefifty that similar behaviour, i.e. adhering to an interface, matters more for generic programming.)

The word domain is used in many meanings, only one of which is "the domain of a function". The word is in itself also associated with sets. Some take it to mean connected open subsets, though even then people also talk about closed domains. In other contexts one might solve a PDE "on a domain", in which case it could be nearly anything. In complex analysis there are regions (e.g. there is a ComplexRegions package). And so on. There is no single good word.

Back to the issue at hand: I think that while DomainSets could be used to describe the domain or the support of a function (which to me is a subset of its domain on which it is nonzero), it should not define the concept as the package itself is not about functions. It is a coincidence that a function with the same name "domain" is defined in DomainSetsCore. I see no conceptual inconsistency in extending the meaning of that function to mean "the domain of a function" when applied to function objects. A function like "support" on the other hand might live in JuliaApproximationCore :-)

@devmotion
Copy link

How about this (if the Distributions maintainers are Ok with it):

* We add `domain` and `support` here, DomainSets owns them.

* Distributions adds support via a Pkg extension (same for MeasureBase and possibly other packages). Distributions deprecates it's own `support` immediately, pointing users to `DomainSets.support`. Distributions.support returns the same as before for the distributions where it works now (basically just univariate dists), the `DomainSets.support` will return the same for these.

* At the next breaking release of Distributions (may be a quite a while), Distributions removes it's own `support` function.

My feeling is that support (or domain or whatever its name is) is such a fundamental part of Distributions that it shouldn't be made optional and shouldn't depend on what packages users load (im- or explicitly).

@oschulz
Copy link
Author

oschulz commented Nov 27, 2023

Thanks for the clarifications @daanhb !

Hmm, I did not see a discussion about names coming up here.

No, sorry - maybe I should have opened a separate issue. In any case, I certainly didn't mean to seem pushy here.

I was just thinking about how we can express things like domains and supports better in packages like MeasureBase/MeasureTheory (they currently use some custom types). And about possible proposals in regard to Distributions to get more consistency in this area. IntervalSets + DomainSets seemed like a good fit.

In any case, from there on I just thinking about "is a" vs. "role in relationship to" in regard to the terms "domain" vs. "set" in the type hierarchy. And I saw that the Readme begins with "DomainSets.jl is a package designed to represent simple infinite sets". So I thought maybe the term "domain" in the type hierarchy had historical reasons.

I agree with @dlfivefifty that similar behaviour, i.e. adhering to an interface, matters more for generic programming

Sure, generic programming in Julia doesn't strictly need supertypes. Taking full advantage of multiple dispatch, though, in my experience, gets much easier with common supertypes (where would we be without AbstractArray?) - especially in larger (systems of) packages/applications. Just my personal view, I know this is sometimes a contentious topic. Certainly subtyping Base classes can also get tricky and result in complex dispatch decisions - I had hoped AbstractSet would be "benign" in this respect (there are some generic methods in Base that seem to assume that an AbstractSet is finite, but maybe not too many to handle?). But I agree, it's something that has to be weighed carefully.

Again, I certainly didn't mean to push too far here.

A function like "support" on the other hand might live in JuliaApproximationCore

In other contexts (InverseFunction) we've been talking about the need for more central APIs to query function properties, like maybe a "FunctionTraits" package, for functions like ismonotonic, islinear, iscontinuous and so on. Maybe such a package could be a good place for function to query domain and support as well? In contrast to questions about linearity, which have a yes/no answer, we should have a community consensus on what types should be returned here, though. Do you think it's within the scope of DomainSets to play such a role (where applicable)?

@oschulz
Copy link
Author

oschulz commented Nov 27, 2023

@devmotion : My feeling is that support (or domain or whatever its name is) is such a fundamental part of Distributions that it shouldn't be made optional and shouldn't depend on what packages users load (im- or explicitly).

Thanks for the feedback @devmotion ! Question: Are there currently any plans to extend Distributions.support to multivariate distributions (and how to represent the result)? And would you feel comfortable with support being hosted in very lightweight central API-def package?

@oschulz
Copy link
Author

oschulz commented Nov 27, 2023

@dlfivefifty: Will an InfiniteSet actually conform to the rules governing AbstractSet? And these rules are not really written down anywhere.

Good point. We could suggest defining that more closely in the language docs? :-) So far I haven't seen anything to "exotic" with AbstractSet "in the field", so maybe it's not too late ... ;-)

Note I added support to Infinite arrays in Julia see

❤️

@daanhb
Copy link
Member

daanhb commented Nov 28, 2023

In other contexts (InverseFunction) we've been talking about the need for more central APIs to query function properties, like maybe a "FunctionTraits" package, for functions like ismonotonic, islinear, iscontinuous and so on. Maybe such a package could be a good place for function to query domain and support as well? In contrast to questions about linearity, which have a yes/no answer, we should have a community consensus on what types should be returned here, though. Do you think it's within the scope of DomainSets to play such a role (where applicable)?

It would be great to see some common ground here! Apart from interfaces there is a lot of duplicated effort, including in DomainSets, especially for affine maps. I was hoping to be able to move everything related to functions and maps into a separate package (#92), or to use existing packages.

@daanhb
Copy link
Member

daanhb commented Nov 28, 2023

Sure, generic programming in Julia doesn't strictly need supertypes. Taking full advantage of multiple dispatch, though, in my experience, gets much easier with common supertypes (where would we be without AbstractArray?) - especially in larger (systems of) packages/applications. Just my personal view, I know this is sometimes a contentious topic. Certainly subtyping Base classes can also get tricky and result in complex dispatch decisions - I had hoped AbstractSet would be "benign" in this respect (there are some generic methods in Base that seem to assume that an AbstractSet is finite, but maybe not too many to handle?). But I agree, it's something that has to be weighed carefully.

Sure, dispatching on a common supertype is beneficial, as is being able to reuse code written for abstract types. I just think both of these advantages are limited in the context of domains, and we've actively been trying to get away from having a common type. (It would be a funny way to settle the eltype debate.) Yet it's good to look at the methods in Base for sets, thanks - it seems we've been missing out on opportunities to be more consistent (like using "issetequal", which seems spot on).

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

It would be great to see some common ground here! Apart from interfaces there is a lot of duplicated effort, including in DomainSets, especially for affine maps.

I'm so with you, here! I wish we have a AbstractLinearOperator <: AbstractMultiplicativeOperator, that people would agree to use, in a central place. Then we wouldn't have to use untyped arguments in algorithms (like solvers) that accept "anything that does a linear mul" all the time (resulting in later failure if users pass the wrong thing with less clear error messages, less possible specialization, and so on).

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

context of domains, and we've actively been trying to get away from having a common type

Do we really have to, though? Like @dlfivefifty says, we even have infinite arrays now - and AbstractArray make a lot of assumptions, and has a lot of generic default implementations. Maybe it would be worth a try to see if we can stretch AbstractSet to infinite sets, and other special sets, without getting into (too much) dispatch trouble? (Or have you guys been down that road before and it didn't work out, maybe?)

I'm just thinking about how far AbstractArray has carried us - we have arrays that you can only read (like mapped arrays), arrays without an efficient single-element getindex (like GPU arrays), arrays without a fixed or even finite size, and so on - but the fact that they're all AbstractArrays has enabled an incredible amount of generic code. If we had independent type hierarchies, it would be a mess of Unions, even more package extensions, and so on - just my take. That why I wonder if we shouldn't give AbstractSet a chance here, so to speak.

There are other cases where we don't have a common supertype, for good reason. Common supertypes for iterable objects for example - and we shouldn't have one, iterability isn't always the primary property of an object. So iterate is "just" a (very successful) API without a type hierarchy.

So I fully agree: Just because objects implement a common API that doesn't automatically make them candidates for a common type hierarchy. We don't have multiple inheritance or similar in the language and should choose super-types very carefully. But, to take full advantage of multiple dispatch we should build common type hierarchies where they also make sense semantically, and where we don't run into "but it's just a much an A as a B" situations.

In this case I would argue that the primary property of a set, and the domains and things we talk about here, is their, well, "set-likeness". :-) That's why, IMHO, building down from AbstractSet would make sense not just technically, but also semantically.

@oschulz oschulz changed the title Function to get the domain/support of objects? Getting the domain/support of objects, and domain type hierarchy Nov 28, 2023
@dlfivefifty
Copy link
Member

The issue is that looking at https://github.com/JuliaLang/julia/blob/master/base/abstractset.jl there's basically no code that will still work for infinite sets 😅

Here's an experiment to detect the interface:

julia> struct MySet <: AbstractSet{Int} end

julia> Set(MySet())
ERROR: MethodError: no method matching length(::MySet)

julia> Base.length(::MySet) = 1

julia> Set(MySet())
ERROR: MethodError: no method matching iterate(::MySet)

julia> Base.iterate(::MySet, k...) = iterate((1,), k...)

julia> Set(MySet())
Set{Int64} with 1 element:
  1

So the interface for AbstractSet is:

  1. length
  2. iterate
  3. in (probably.... not needed in the example above)
  4. eltype (this is automatic)

So the issue for domains is that we lose (2) and potentially (4) no longer makes sense. We could make a major PR into Julia to add overloading say intersect or union when length is infinite... I think that will be more difficult than the minimal changes I did for InfiniteArrays.jl (and there were complaints about it being merged when some Julia developers realised I had added Base.oneto without them knowing!)

We could take the ArrayLayouts.jl approach and have a parallel implementation of everything. I'm not sure I'd recommend it either.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

@dlfivefifty: there's basically no code that will still work for infinite sets 😅

Sure, but abstract types don't necessarily need to provide generic implementations/methods for the functions associated with them, right? What there should be, is contracts/semantics of the interface (and yes, we should write those down explicitly way more often :-) ).

If you build an InfiniteArray like the MySet in the example above, pretty much none of the default methods will work either. ;-)

And sure, infinite sets won't support length and iterate, but neither do infinite arrays (at least not iteration in finite time), and usually length and iteration are assumed be quite fundamental for arrays, I'd say. :-) But what all arrays that we have do support is random access: getindex (of single elements or ranges of elements). It's what makes them fundamentally different from sequences.

I would say it's very similar for sets: Finite sets have a length and support iteration, infinite sets don't. But what they all share, what makes them sets, is the ability to check if they contain an element in a (somewhat) efficient manner: Base.in. I'd say it is for sets what getindex is for arrays. Arrays have a (somewhat) efficient getindex, sets have a (somewhat) efficient in. At least that how I would define it.

And of course we'll want to support operations like union - where possible. We won't be able to implement that for all possible combinations of sets in a sensible way, but then we can't implement vcat for all possible combinations of arrays very sensibly either.

(I should actually be pretty easy to add a generic union implementation for AbstractSets to Base, just using in, and retrofit it via Compat. Generic unions could even be made iterable if their constituents are. Generic intersects would also be easy, no way to get default iteration there, though.)

(We could even define iteration for certain infinite sets, for hypercubes for example, we could use quasirandom sequences like Sobol to implement infinite iteration. That could actually be useful in practice.)

@daanhb
Copy link
Member

daanhb commented Nov 28, 2023

Do we really have to, though? Like @dlfivefifty says, we even have infinite arrays now - and AbstractArray make a lot of assumptions, and has a lot of generic default implementations. Maybe it would be worth a try to see if we can stretch AbstractSet to infinite sets, and other special sets, without getting into (too much) dispatch trouble? (Or have you guys been down that road before and it didn't work out, maybe?)

You raise good points but one can argue the other way here. It's great that the community can agree on using AbstractArray as a common supertype, but it has side-effects. Plenty of objects can be thought of as an array in a sense, yet have another more logical supertype. Images come to mind (this goes way back). Not inheriting from AbstractArray makes it hard to use the nice indexing logic that is in Base.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

Plenty of objects can be thought of as an array in a sense, yet have another more logical supertype. Images come to mind (this goes way back)

Yes, that's a good example - for an Image (though the current approach is certainly practical) one wouldn't necessarily say that it's primary property is that is an array of pixels. An alternative design could, for example, be images as subtypes of AbstractImage (which wouldn't be an array) and then functions like getpixels (returning and array of color values), getmetadata, and so on. Just one possibility.

I would argue that the situation is different here though - what we really want here is sets, in the mathematical sense, first and foremost. An interval is a set (hence "IntervalSets"), so why not make is a subtype of AbstractSet. The real numbers are a set. A hypercube is a set (since in our context we're interested in it's "contents", not it's surface).

many packages out there that define triangles and rectangles do not actually have a definition of in

True, though maybe they should, in some cases. But it depends, if it's about 3D geometries, one may be interested in the surface or the volume, so in on the object might be unclear. So maybe there one would want x in volumeset(obj) and x in surfaceset(obj) or so?

But looking at what InvervalsSets and DomainSets currently do, I would say that the package names, documentation and use make it clear that this is really about sets directly, and that DomainSets.Cube means the volume of the cube. And we do define in everywhere. To me that the types here say "I could happily be an AbstractSet". :-)

@daanhb
Copy link
Member

daanhb commented Nov 28, 2023

But looking at what InvervalsSets and DomainSets currently do, I would say that the package names, documentation and use make it clear that this is really about sets directly, and that DomainSets.Cube means the volume of the cube. And we do define in everywhere. To me that the types here say "I could happily be an AbstractSet". :-)

Valid point, the types in DomainSets could be.

@daanhb
Copy link
Member

daanhb commented Nov 28, 2023

We could take the ArrayLayouts.jl approach and have a parallel implementation of everything. I'm not sure I'd recommend it either.

Care to elaborate on that?

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

Back to AbstractSet: we've established that little to no code in Base is reusable

I think that's not a problem. To me, the main purpose of abstract type is not maximum code reuse, and certainly not at the highest level. Sure, it's good have generic implementations for many functionalities directly for the abstract type, but it's not a must. And I would definitely suggest that we propose a more generic implementation for at least union and intersect in Base and Compat, that should be fairly easy without breakages.

that sets have a unique element type

Personally I would say it doesn't kill the idea - it's a general problem we have in many areas after all. I would say defaulting to Float64 resp. Int or so is workable, in practical use. We'll typically ask questions to sets, not generate based on them. It's not perfect, but having no information on type isn't ideal either ... maybe the pragmantic approach is good enough.

we violate the assumption of sets being finite

True, though maybe that assumption can change. It has worked out well with arrays, I'd say that's encouraging.

we violate the assumption that sets have a unique element type.

Yes, that's indeed a bit more tricky - is touches on the old "parameter type vs. generated type and what should eltype be debate" for Distribution, Sampleable and so on.

So what's left? I have a hard time even thinking of a case where one might want to dispatch on "being a set"

One very easy example is distributions/measures. Many (log-)pdf/density methods first check if the point with within the support. But in complex systems, one may want to pass that support on, so having a common type is good (so function arguments don't always have to be untyped - types carry meaning, also to the reader of the code).

The one thing sets share is "in", which is the one thing that can never have a generic implementation.

Yes, exactly - just like getindex is the one thing you always have to define for new array types. We can have a generic union and intersect though, and maybe a few other things as well.

Playing advocate of the devil here, not bashing.

Sure, and I agree we should turn this every possible way and check very carefully if make things AbstractSets is sound and safe.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

@hyrodium and @aplavin , can I pull in you here for an "IntervalSets" perspective? The thread above is kinda long, but the gist is (apart from the question where query functions like domain and support could best be defined) whether it wouldn't make sense to put Base.AbstractSet at the top of the IntervalsSets and DomainSets type hierarchies (and maybe add a lightweight package "InfiniteSets" or so as an indermediate).

@daanhb
Copy link
Member

daanhb commented Nov 28, 2023

Thanks for engaging the discussion :-)

Back to AbstractSet: we've established that little to no code in Base is reusable

I think that's not a problem. To me, the main purpose of abstract type is not maximum code reuse, and certainly not at the highest level. Sure, it's good have generic implementations for many functionalities directly for the abstract type, but it's not a must. And I would definitely suggest that we propose a more generic implementation for at least union and intersect in Base and Compat, that should be fairly easy without breakages.

Alright, let's entertain the notion that a common implementation could be provided.

that sets have a unique element type

Personally I would say it doesn't kill the idea - it's a general problem we have in many areas after all. I would say defaulting to Float64 resp. Int or so is workable, in practical use. We'll typically ask questions to sets, not generate based on them. It's not perfect, but having no information on type isn't ideal either ... maybe the pragmantic approach is good enough.

Let's say I agree with `eltype(d::Domain{T}) == T' for pragmatic reasons

we violate the assumption of sets being finite

True, though maybe that assumption can change. It has worked out well with arrays, I'd say that's encouraging.

Has it? (genuine question)

So what's left? I have a hard time even thinking of a case where one might want to dispatch on "being a set"

One very easy example is distributions/measures. Many (log-)pdf/density methods first check if the point with within the support. But in complex systems, one may want to pass that support on, so having a common type is good (so function arguments don't always have to be untyped - types carry meaning, also to the reader of the code).

I can write

function my_personal_in(x, this_argument_should_be_a_domain)
   checkdomain(this_argument_should_be_a_domain)
   in(x, this_argument_should_be_a_domain)
end

and catch user errors and make my intention specific. I can still specialize the function for concrete domains that have a better implementation. Dispatch allows me to write another function like this:

function my_personal_in(x, this_argument_should_not_be_a_domain)
    # what is this function about?
end

but I can code my way around this. Using dispatch in this way arguably makes my code less readable, so I'd rather have less of this in packages than more.

The one thing sets share is "in", which is the one thing that can never have a generic implementation.

Yes, exactly - just like getindex is the one thing you always have to define for new array types. We can have a generic union and intersect though, and maybe a few other things as well.

Yet union and intersect currently work for AbstractArray which does not inherit from AbstractSet anyway. The notion that arguments to these functions are sets lies in the name of the functions, not in the types of the arguments.

I had missed that by the way. The issetequal function is semantically equivalent to what I've called isequaldomain.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

@daanhb : True, though maybe that assumption can change. It has worked out well with arrays, I'd say that's encouraging.
Has it? (genuine question)

I haven't done much with InfiniteArrays.jl, but Chris Rackaukas told me recently the the SciML people are definitely making use of it now.

my_personal_in(x, this_argument_should_be_a_domain)

Yes, but I think my_personal_in(x, domain::AbstractSet) is much clearer. Where we can't use supertypes easily we should use a trait-like interface, but I think where we can sensibly use supertypes, there we should.

but I can code my way around this. Using dispatch in this way arguably makes my code less readable, so I'd rather have less of this in packages than more.

I agree.

Yet union and intersect currently work for AbstractArray which does not inherit from AbstractSet anyway.

Yes - I don't know if I'm happy about that ... but I don't think should keep us from using AbstractSet where is does fit well, semantically. :-)

I would say, the "primary" thing that makes them what they are is:

  • AbstractArray: Efficient random access Base.getindex (with some caveats, e.g. GPU arrays)

  • AbstractSet: Efficient check for membership - Base.in (caveats could apply with some special sets)

Those are also the primary functions that always have to be implemented for subtypes, so I think that fits together.

@aplavin
Copy link
Contributor

aplavin commented Nov 28, 2023

@oschulz I don't see how stuff like intervals being AbstractSets would help or simplify anything. One would still need a wrapper function like domain(collection) = DiscreteDomain(collection) – because the most common collection types are arrays, they make basically as much sense in this context as Base.Set does, but they definitely aren't <: AbstractSets.

And if such a function is needed anyway, seems more straightforward to tell users to use domain(Set(...)) when a Set should be interpreted as a Domain.

@daanhb
Copy link
Member

daanhb commented Nov 28, 2023

I haven't done much with InfiniteArrays.jl, but Chris Rackaukas told me recently the the SciML people are definitely making use of it now.

Sure, I have no doubt it is useful, I also use it. We're wondering here about the necessity of inheritance from AbstractArray. What is achieved that can not be achieved by not inheriting from AbstractArray? That was my genuine question, as it may help us here.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

What is achieved that can not be achieved by not inheriting from AbstractArray? That was my genuine question, as it may help us here.

Ah - I would say passing it through a lot of generic code that doesn't require arrays to be finite sized, but that does require an AbstractArray in it's interfaces.

If we make get rid of all type hierarchies, we lose the power of multiple dispatch, resp. have to write everything trait-style which will be a lot less readable and a lot more complex too, in the long run. And people will constantly have to define all kinds of Unions, but those can't be extended by other packages, so we (at least partially) lose our solution to the expression problem.

@daanhb
Copy link
Member

daanhb commented Nov 28, 2023

Ah - I would say passing it through a lot of generic code that doesn't require arrays to be finite sized, but that does require an AbstractArray in it's interfaces.

Right, perhaps this wasn't the best of examples. I'll try to think of something more constructive, such as a generic fallback for intersect to replace this one. But right now my plane has arrived :-)

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

Safe flight! :-)

When you're on the ground again, maybe an argument in the other direction: I think we went through at least a few advantages of subtyping AbstractSet here. So, as kind of inverse devil's advocate, why not do it. I mean, what would be the risks or potential relevant disadvantages?

@aplavin
Copy link
Contributor

aplavin commented Nov 28, 2023

This is the list of methods taking AbstractSet in Base:

[1] <(a::AbstractSet, b::AbstractSet) @ Base abstractset.jl:484
[2] <=(a::AbstractSet, b::AbstractSet) @ Base abstractset.jl:485
[3] ==(a::AbstractSet, b::AbstractSet) @ Base abstractset.jl:481
[4] convert(::Type{T}, s::T) where T<:AbstractSet @ Base set.jl:550
[5] convert(::Type{T}, s::AbstractSet) where T<:AbstractSet @ Base set.jl:551
[6] copy!(dst::AbstractSet, src::AbstractSet) @ Base abstractset.jl:6
[7] empty(s::AbstractSet{T}) where T @ Base set.jl:67
[8] empty(s::AbstractSet{T}, ::Type{U}) where {T, U} @ Base set.jl:67
[9] filter(pred, s::AbstractSet) @ Base abstractset.jl:489
[10] hash(s::AbstractSet, h::UInt64) @ Base set.jl:542
[11] intersect(s::AbstractSet, itr) @ Base abstractset.jl:170
[12] intersect(s::AbstractSet, itr, itrs...) @ Base abstractset.jl:151
[13] intersect!(s::AbstractSet, s2::AbstractSet) @ Base abstractset.jl:192
[14] intersect!(s::AbstractSet, itr) @ Base abstractset.jl:193
[15] intersect!(s::AbstractSet, itrs...) @ Base abstractset.jl:186
[16] issetequal(a::AbstractSet, b::AbstractSet) @ Base abstractset.jl:427
[17] issetequal(a::AbstractSet, b) @ Base abstractset.jl:428
[18] issetequal(a, b::AbstractSet) @ Base abstractset.jl:430
[19] map(f, ::AbstractSet) @ Base abstractarray.jl:3294
[20] setdiff(s::AbstractSet, itrs...) @ Base abstractset.jl:212
[21] setdiff!(s::AbstractSet, itr) @ Base abstractset.jl:238
[22] setdiff!(s::AbstractSet, itrs...) @ Base abstractset.jl:232
[23] show(io::IO, ::MIME{Symbol("text/plain")}, t::AbstractSet{T}) where T @ Base show.jl:215
[24] sizehint!(s::AbstractSet, n) @ Base abstractset.jl:4
[25] summary(io::IO, t::AbstractSet) @ Base show.jl:209
[26] symdiff!(s::BitSet, ns::AbstractSet) @ Base bitset.jl:319
[27] symdiff!(s::AbstractSet, itr::AbstractSet) @ Base abstractset.jl:285
[28] symdiff!(s::AbstractSet, itr) @ Base abstractset.jl:283
[29] symdiff!(s::AbstractSet, itrs...) @ Base abstractset.jl:276
[30] union(s::AbstractSet) @ Base abstractset.jl:58
[31] union!(s::AbstractSet{T}, itr) where T @ Base abstractset.jl:101
[32] union!(s::AbstractSet, sets...) @ Base abstractset.jl:83
[33] ⊊(a::AbstractSet, b::AbstractSet) @ Base abstractset.jl:378
[34] ⊊(a::AbstractSet, b) @ Base abstractset.jl:379
[35] ⊊(a, b::AbstractSet) @ Base abstractset.jl:380
It's hard to find those in the list that could meaningfully be reused for something like an interval...

I think the decision to subtype shouldn't be based on the name of the type (it has "Set" in there and intervals are sets in maths), but on the semantic that follows from methods defined on such a type. Similar with Number and Real: there are lots of Julia types that wouldn't really be called "reals" in maths, but the behavior of this type in Julia does fit a goal well.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

It's hard to find those in the list that could meaningfully be reused for something like an interval

Well, intersect is certainly possible, union too (but would return a "multi-interval"). in is of course central. empty can be meaningfully defined for intervals as well.

I would argue again the the criterion should not be if current generic methods for AbstractInterval in Base already work for what we need (also, those can be evolved) - but whether the semantics of the type and functions that are understood to work with it are a good fit.

I think the decision to subtype shouldn't be based on the name of the type (it has "Set" in there and intervals are sets in maths), but on the semantic that follows from methods defined on such a type.

I agree, it should just depend on the name, the actualy primary nature of the type should be the deciding factor, I think. And the semantics have to fit, but I don't think those are necessarily defined by what the (quite few) generic methods for AbstractSet currently do. Just because no one forsaw infinite sets when they where written doesn't imply to me that the supertype can't (after careful consideration) be understood to cover those too.

there are lots of Julia types that wouldn't really be called "reals" in maths, but the behavior of this type in Julia does fit a goal well.

Yes - and also here, quite a few of the generic method implementations in Base.Real have to be significantly changed for, e.g., Symbolics.Num.

You're right, with Real we often push it very far (though with great success). But I don't think that should keep us from making types that are clearly sets by natures subtypes of AbstractSet. I don't think the current documentation or implementation of AbstractSet clearly implies that this is only supposed to cover concrete finite sets. Quite the contrary, I think in contrast to the AbstractArray implementation in Base, it leaves it very open. And if we consider what we now make subtypes of AbstractArray - with immense success ... why should that be different for sets?

@hyrodium
Copy link

Mathematically speaking, everything must be a set in ZFC, so Any === AbstractSet would be reasonable. Just kidding.

I think the term "set" in AbstractSet is not a mathematical set, but a data type, so it would be natural not to have AbstractSet as a supertype of AbstractInterval and similar types.

@aplavin
Copy link
Contributor

aplavin commented Nov 28, 2023

Well, intersect is certainly possible, union too (but would return a "multi-interval"). in is of course central. empty can be meaningfully defined for intervals as well.

Are you sure you are talking about methods, not functions? I looked at a few of those, they don't seem feasible to generalize to intervals (ie make Set and Interval be handled with the same method). Mostly it's because of fundamental interface differences, but sometimes it's a question of semantics. I wouldn't really like this to be a comparison of intervals:

<=(a::AbstractSet, b::AbstractSet) = a ⊆ b

But, if subtyping AbstractSet, this should stay.

As for functions themselves, they are defined for way more types that AbstractSet – for example, arrays.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

I think the term "set" in AbstractSet is not a mathematical set, but a data type,

But where does it say that?

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

@aplavin: I wouldn't really like this to be a comparison of intervals <=(a::AbstractSet, b::AbstractSet)

Uh, I hadn't seen that, that is ugly. Hm, in Base it says

# convenience functions for AbstractSet
# (if needed, only their synonyms ⊊ and ⊆ must be specialized)
<( a::AbstractSet, b::AbstractSet) = a ⊊ b
<=(a::AbstractSet, b::AbstractSet) = a ⊆ b

So < is meant to be a "less-unicody" notation for a ⊊ b and <=(a::AbstractSet, b::AbstractSet) = a ⊆ b for sets. I think that does push beyond the common semantics for < - which usually implies an order - way too far.

However, it is defined like that in Base, and Set, clearly behaves like a mathematical finite set in the ways that count (I think). So the question is, does it hurt (except the eyes) to have < as a kind of weird shorthand for for sets? Would we want to use < for sets in any other way (I'd say no)? But yes, it's ugly.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

@aplavin : Are you sure you are talking about methods, not functions? I looked at a few of those, they don't seem feasible to generalize to intervals

Which others (apart from the very ugly < and <= discussed above) would you say are a bad fit?

Mostly it's because of fundamental interface differences

Which ones? The interface of AbstractSet is so narrow ...

@aplavin
Copy link
Contributor

aplavin commented Nov 28, 2023

Oh, lots of methods are a bad fit even if we just consider intervals! Sorry I'm always using intervals as a concrete example to keep in mind :)

# just a few methods defined on abstractsets

filter(pred, s::AbstractSet) = mapfilter(pred, push!, s, emptymutable(s))
function mapfilter(pred, f, itr, res)
    for x in itr
        pred(x) && f(res, x)
    end
    res
end

setdiff(s::AbstractSet, itrs...) = setdiff!(copymutable(s), itrs...)

function hash(s::AbstractSet, h::UInt)
    hv = hashs_seed
    for x in s
        hv ⊻= hash(x)
    end
    hash(hv, h)
end

Better to ask if there are any meaningful methods that are a good fit.

does it hurt (except the eyes) to have < as a kind of weird shorthand for ⊆ for sets

It's better to have interval/rectangle/... < comparison to error, as it does now. There's no general objective way to define which one is less.

And again, I think there are very few, if any, actual advantages to this subtyping. Do you have a specific concrete example of code that becomes simpler or more generic with Interval <: AbstractSet?

@hyrodium
Copy link

But where does it say that?

I'm not sure, and this is why I wrote "I think".
This is a deep problem in Julia. See What's bad about Julia? - Abstract interfaces are unenforced and undiscoverable for example.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

If you guys think Base.AbstractSet is indeed too different and not be nature a mathematical set, then how about this proposal:

  • We create a lightweight package MathSets.jl or so under JuliaMath. It contains

    • An abstract type AbstractMathSet (or similar)

    • An abstract type AbstractFiniteSet with a concrete type FiniteSet that just wraps Base.Set, and conversions between both.

    • Unions SetLike = Union{AbstractSet, AbstractMathSet} and FiniteSetLike = Union{AbstractSet, AbstractFiniteSet}, so people don't have to define them themselves if they write code that accepts both.

    • Generic methods for union and intersect, and a generic iterate for generic unions.

I would still argue that the term "set" is better than "domain". While "domain" is sometimes understood to represent a set itself, I would say that way more often "domain" is used to state a relationship between a set and another mathematical object, or mathematical structure (e.g. "integral domain") that combines sets with operations. "mathematical set", however, is very clear.

But, while I think something like the proposal above would work well, I would still ask the question if AbstractSet is really not mathematical enough - using AbstractSet directly would be much simpler.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

But where does it say that?
@hyrodium : I'm not sure, and this is why I wrote "I think". This is a deep problem in Julia [...]

Ah, sorry - yes, though I think less a fundamental problem with the language but more a lack of documentation in some places. :-)

On the other hand, I think we can take it as an opportunity - I'd say AbstractSet is fairly open, and we could propose additions to clarify what it means. And we could even propose deprecating < and <= for AbstractSet.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

I did a bit of digging - the idea to introduce using isless for sets seems to trace back to the year 2012 - maybe one might call it a "primordial sin"? It looks like the unicode operators for subset didn't exist back then.

@aplavin
Copy link
Contributor

aplavin commented Nov 28, 2023

I definitely like the proposal for a special supertype more. Is that basically the DomainSetsCore proposal, just with a different name?

> Unions SetLike = Union{AbstractSet, AbstractMathSet} and FiniteSetLike = Union{AbstractSet, AbstractFiniteSet}, so people don't have to define them themselves if they write code that accepts both.

There are other common types that can naturally represent "sets of elements". Even Base provides lots of set operations for arrays. And there also are UniqueArrays.jl and other packages with structs that are fundamentally "sets".

That's why I'm saying there is little benefit in using Base.AbstractSet as a supertype, because a thin wrapper function like domain(collection)::Domain or as_set(collection)::MathSet would be needed anyway

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

Oh, lots of methods are a bad fit even if we just consider intervals! Sorry I'm always using intervals as a concrete example to keep in mind :)

Please do - if it doesn't make sense for intervals than it certainly won't make sense for more complex sets! :-)

I don't think filter and mapfilter are a good reason though - we can't have those for infinite arrays either, for example. hash could definitely be implemented for infinite set types. setdiff we could only implement for certain set types, but I'd say that's Ok.

When it comes to what the generic implementations for them do, I'd point to AbstractArray again. Many array types need to specialize methods beyond getindex, setindex! and size - not just for increased performance, but because the default implementations wouldn't work at all.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

There are other common types that can naturally represent "sets of elements". Even Base provides lots of set operations for arrays. And there also are UniqueArrays.jl and other packages with structs that are fundamentally "sets".

Are they really though? A set is has by definition not sequence implied, but arrays are required not to reshuffle their elements. I think this is a fundamental difference.

@aplavin
Copy link
Contributor

aplavin commented Nov 28, 2023

"Sorted array" type is basically a variant of the "set" datastructure implementation. Aside from arrays, there are also Dictionaries.jl with their own "set" type. And surely lots more!

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

I definitely like the proposal for a special supertype more. Is that basically the DomainSetsCore proposal, just with a different name?

Not quite, I think. The DomainSetsCore proposal looks more like trait-centric interface. I would propose something more type-centric. I think sets are such a fundamental concept (like arrays), that going with (and encouraging people to subscribe to) a type hierarchy would work very well.

@oschulz
Copy link
Author

oschulz commented Nov 28, 2023

I can see that you guys aren't happy with the AbstractSet approach - but would you maybe have a concrete example of where it would really not work out cleanly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants