Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keys(::Generator) for find* and arg* #34674

Closed
rapus95 opened this issue Feb 5, 2020 · 9 comments · Fixed by #34678
Closed

keys(::Generator) for find* and arg* #34674

rapus95 opened this issue Feb 5, 2020 · 9 comments · Fixed by #34678

Comments

@rapus95
Copy link
Contributor

rapus95 commented Feb 5, 2020

Tl;Dr:
Add

Base.keys(gen::Base.Generator) = Base.keys(gen.iter)

to Base.


Partially related to my comment in #34368 (comment) regarding the support of traits on generators.
Currently there is no way to use find* on generators. Reason is that there is no keys function:

julia> findmax(abs(x-1) for x in [1,5,2,9])
ERROR: MethodError: no method matching keys(::Base.Generator{Array{Int64,1},var"#97#98"})

But since generators are said to be the lazy variant of map and map preserves the indices it would be good to define that missing keys method.

That would allow for some very generic findmax(iter) and argmax(iter) as arbitrary lazy maps are allowed while maintaining good performance and low memory footprint.

Edit: adding the missing find* and arg* with custom transform functions would be appreciated aswell to allow for the convenient do-block notation.

@rapus95 rapus95 changed the title findX on Generator vs map keys(::Generator) for find* and arg* Feb 5, 2020
@mkborregaard
Copy link
Contributor

If you have a specific code change suggestion, why don't you go directly to PR?

@rapus95
Copy link
Contributor Author

rapus95 commented Feb 5, 2020

If you have a specific code change suggestion, why don't you go directly to PR?

Well, I didn't expect such a far-reaching change to go without discussion. But I can open a PR aswell.

@JeffBezanson
Copy link
Member

This is maybe ok, but what I would want argmax(f, x) to do is return the value from x (not the key of x) that maximizes f (I think there is an issue about this). If we feel these are consistent and not too much of a gotcha then this is probably ok.

@JeffBezanson
Copy link
Member

See #27612 and #27613 for some related discussion.

@rapus95
Copy link
Contributor Author

rapus95 commented Feb 6, 2020

So, to get it right, that's the actual proposal right now (?):
Using the 2-arg version of find* and arg* will somewhat "lift" the level at which the argument/index is returned:
findmax(("well", "done", "will", "work")) == ("work", 4)

findmax(reverse(x) for x in ("well", "done", "will", "work")) == ("lliw", 4))

findmax(reverse, ("well", "done", "will", "work")) == ("lliw", "will"))

arg* will consistently return only the 2nd argument: arg*(args...) = find*(args...)[2]
and keys will fill-in the missing pieces.
There's only one caveat: findmax(itr) != findmax(identity, itr) == maximum(itr) and likewise for the others. Formally we have
find*(iter) == find*(k->getindex(iter, k), keys(iter)) and
find*(f, iter) == ((val, key) = find*(f(x) for x in iter); val, iter[key])

Thus:

"""
    argmax(f, iter)
returns the element of `iter` which maximizes `f`. No `keys(iter)` required.

    argmax(iter)
returns the key which is associated with the largest value. Requires `keys(iter)` to be defined.
"""

Even though that shift of index-level feels confusing at the first moment and not consistent between 1- and 2-arg version it is the right way IMO:

The 2-arg version is the natural definition from mathematics. Take that as given.

The 1-arg version by itself doesn't make much sense because it is just an iterable of values. Thus, we try to fill in the best meaning we can come up with for the 1-arg version.

  1. If you want the maximum -> call maximum (findmax would always return (x,x))
  2. But if you want some sense of index (which we indeed look for), then returning a key is the more generic approach as keys do not need to be ordered while indices in the mathematical meaning usually require some sort of partial ordering, aka each index can be used as key but not vice versa.

@tkf
Copy link
Member

tkf commented Feb 7, 2020

I think another way to look at is that we have two kinds of methods

argmax(f, domain)
argmax(f)

where the second kind of method (including argmax(::AbstractArray)) is for the case the "domain" is obvious by the definition of f. If you consider an array as a mapping from indices to values, this fits in the second kind of method. This has a natural extension to any indexable objects including Dict (which is sometimes called mapping). Of course, arrays and dicts are not callable in Julia and this view is not really perfect. But it may help conceptualizing argmax, maximum, and findmax consistently; i.e.,

  • argmax(f, [domain]) returns a point in the domain
  • maximum(f, [domain]) returns a point in the codomain
  • findmax(f, [domain]) returns a pair of a point in the domain and a point the codomain

By the way, it would be nice to extend this to Iterators.Filter as well so that you can do something like findmax(x for x in [1,7,7,NaN] if !isnan(x)).

@tkf
Copy link
Member

tkf commented Feb 8, 2020

I just realized what I wrote is not really solving the issue of argmax(::Generator) unless getindex(::Generator, ...) is defined.

I think it's much more straight forward to convince people for supporting argmax(::Broadcasted) because getindex is already defined for Broadcasted. Of course, it'd mean that we need #19198. If we solve it, e.g., with #31553, then we'd have argmax(for f.(xs)). I think we should just define findmax/findmin in terms of reduce. It would then give us, e.g., argmax(::Broadcasted) via #31020.

@rapus95
Copy link
Contributor Author

rapus95 commented Feb 28, 2020

Bump:
Is it acceptable that find*(itr) != find*(identity, itr) in general?

While in general Base.keys(g::Base.Generator) = g.iter would also be a valid choice IMO, that would violate the contract of being the lazy variant of map. So I guess to really be the lazy map we actually need the originally proposed method, no matter what that will imply for 2-arg variants:
Base.keys(gen::Base.Generator) = Base.keys(gen.iter)

We may also consider adding
Base.getindex(gen::Base.Generator, idx...) = gen.f(gen.iter[idx...])
so that indexing will work whenever the iterator which founds the generator is indexable itself.

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label Feb 28, 2020
@JeffBezanson
Copy link
Member

Is it acceptable that find*(itr) != find*(identity, itr) in general?

Yes I think so; it's like the difference between pairs(array) and zip(foo, array).

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Mar 19, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Mar 30, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Nov 21, 2020
cmcaine added a commit to cmcaine/julia that referenced this issue Dec 17, 2020
StefanKarpinski pushed a commit that referenced this issue Dec 18, 2020
Defines a descending total order, `isgreater` (not exported), where unordered values like NaNs and missing are last. This makes defining min, argmin, etc, simpler and more consistent. Also adds 2-arg versions of findmax/min, argmax/min. Defines and exports the `isunordered` predicate for testing whether a value is unordered like NaN and missing. Fixes #27613. Related: #27639, #27612, #34674.

Thanks to @tkf, @StefanKarpinski and @drewrobson for their assistance with this PR.

Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
ElOceanografo pushed a commit to ElOceanografo/julia that referenced this issue May 4, 2021
Defines a descending total order, `isgreater` (not exported), where unordered values like NaNs and missing are last. This makes defining min, argmin, etc, simpler and more consistent. Also adds 2-arg versions of findmax/min, argmax/min. Defines and exports the `isunordered` predicate for testing whether a value is unordered like NaN and missing. Fixes JuliaLang#27613. Related: JuliaLang#27639, JuliaLang#27612, JuliaLang#34674.

Thanks to @tkf, @StefanKarpinski and @drewrobson for their assistance with this PR.

Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Takafumi Arakaki <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants