Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/ RFC/First PR: Add unzip to base.iterators.jl #21208

Closed
wants to merge 12 commits into from
Closed

WIP/ RFC/First PR: Add unzip to base.iterators.jl #21208

wants to merge 12 commits into from

Conversation

miguelraz
Copy link
Contributor

@StefanKarpinski Julia needs an underling to give this PR a shot. #13942.

  1. I know I'm clearly missing some important things, but I hope the effort counts. I couldn't find a way to get the rows from the array comprehension...
  2. Willing to start over from scratch - any comments/critiques/suggestions more than welcome.
  3. I am first trying a MWE with simple structure and inputs. Then I will try to get something that works more generically. Is this a good way to procceed?

[ci skip]

Roadmap:

  • Verify each returned array is the same type as the types of the elements
  • Verify lengths of all arrays to be unzipped
  • Return as a single Array of Arrays or as separate Arrays?
  • Remove that zip is its own inverse in @doc zip?
  • Revise documentation phrasing and example

~~@StefanKarpinski~~ Julia needs an underling to give this PR a shot. [#13942](#13942).
1. I know I'm clearly missing some important things, but I hope the effort counts. I couldn't find a way to get the rows from the array comprehension...
2. Willing to start over from scratch - any comments/critiques/suggestions more than welcome.
3. I am first trying a MWE with simple structure and inputs. Then I will try to get something that works more generically. Is this a good way to procceed?

[ci skip]

Roadmap:
* Verify each returned array is the same type as the types of the elements
* Verify lengths of all arrays to be unzipped
* Return as a single Array of Arrays or as separate Arrays?
* Remove that `zip is its own inverse` in `@doc zip`?
* Revise documentation phrasing and example
@kshyatt kshyatt added the needs tests Unit tests are required for this change label Mar 28, 2017
@JeffBezanson
Copy link
Member

JeffBezanson commented Mar 28, 2017

I am first trying a MWE with simple structure and inputs. Then I will try to get something that works more generically. Is this a good way to procceed?

Definitely; I almost always start with examples and then generalize.

If you want an array of arrays, you can use nested comprehensions.

Ideally this would work with arbitrary iterators, which do not necessarily support length and indexing. However it might be reasonable to assume that the objects inside the outer iterator do support indexing; this is certainly the case for the result of zip.

Remove that zip is its own inverse in @doc zip?

+1 to that

~~@StefanKarpinski~~ Julia needs an underling to give this PR a shot. [#13942](#13942).
1. I know I'm clearly missing some important things, but I hope the effort counts. I couldn't find a way to get the rows from the array comprehension...
2. Willing to start over from scratch - any comments/critiques/suggestions more than welcome.
3. I am first trying a MWE with simple structure and inputs. Then I will try to get something that works more generically. Is this a good way to procceed?

[ci skip]

Roadmap:
* Verify each returned array is the same type as the types of the elements
* Verify lengths of all arrays to be unzipped
* Return as a single Array of Arrays or as separate Arrays?
* Remove that `zip is its own inverse` in `@doc zip`?
* Revise documentation phrasing and example
@ararslan
Copy link
Member

ararslan commented Mar 28, 2017

If I may offer a couple suggestions, I would have the function return a tuple of iterators rather than an array of arrays, and I would restrict unzip to Zip objects. As an example:

unzip(iter::AbstractZipIterator) = _unzip(iter)
# This avoids exporting the method that isn't for AbstractZipIterators
_unzip(iter::Zip1) = (iter.a,)
_unzip(iter::Zip2) = (iter.a, iter.b)
_unzip(iter::Zip) = (iter.a, _unzip(iter.z)...)
_unzip(iter) = (iter,)

This makes use of the internal structure of the Zip objects to preserve the iterators that were passed to zip.

@miguelraz
Copy link
Contributor Author

miguelraz commented Mar 28, 2017

Welcome!

@JeffBezanson Thanks! Been lurking for a while. Time to be brave and attempt a commit.

If you want an array of arrays, you can use nested comprehensions.

Tried quite a few variations of that - still haven't got the hang of it yet.
I think I will go with @ararslan's advice and use the Zip object structre that is already in place - though I confess those functions still look like voodoo magic to me, so I am going to just copy paste this until I get a better hang of it.

I will try following the advice that it says to push onto the patch-3 branch on my own fork.
I have added tests and will need a comment on those too, if anyone is willing to help.
Will report results.

@miguelraz
Copy link
Contributor Author

@kshyatt Added tests

@ararslan
Copy link
Member

ararslan commented Mar 28, 2017

though I confess those functions still look like voodoo magic to me

I'll try to break it down so it's more understandable.

First we're defining a function to export called unzip that accepts objects of type Base.Iterators.AbstractZipIterator. The goal here is to losslessly (or as much so as we can) recover the input passed to zip.

unzip(iter::AbstractZipIterator) = _unzip(iter)

We're just making it call an internal, unexported function _unzip. Doing it this way ensures that when someone goes to use the unzip function, the only visible method is the one that takes AbstractZipIterators.

Now we define the internal function. This exploits the structure of the Zip objects. Here are the objects as defined in base/iterators.jl:

struct Zip1{I} <: AbstractZipIterator
    a::I
end

struct Zip2{I1, I2} <: AbstractZipIterator
    a::I1
    b::I2
end

struct Zip{I, Z<:AbstractZipIterator} <: AbstractZipIterator
    a::I
    z::Z
end

For Zip1, we see it has 1 field called a. So we add a method to _unzip to extract the a field, then wrap it in a tuple. This is

_unzip(iter::Zip1) = (iter.a,)

For Zip2, there are 2 fields, a and b, so we do the same as for Zip1:

_unzip(iter::Zip2) = (iter.a, iter.b)

Zip is a little different; one field is an iterator and the other is another Zip. Thus we can call _unzip recursively to completely unzip. To ensure we don't get nested tuples, we splat the result. This is

_unzip(iter::Zip) = (iter.a, _unzip(iter.z)...)

Then we define a fallback method that just wraps the input in a tuple. This is a safeguard against any potential infinite recursion in the Zip method.

_unzip(iter) = (iter,)

And that's it! Seem any less magical?

@miguelraz
Copy link
Contributor Author

miguelraz commented Mar 28, 2017

Damn, this is why I love Julia.
It definitely looks way less magical now!
@ararslan a trillion internet points to you kind sir.
I kinda started thinking that they were trying to setup a divide and conque algo, but it is now crystal clear.

I guess I have some questions still though:
Why do we like to return tuples so much? Why not arrays?

I have no idea how to proceed now, though I just added edited the doc lines to make sure that zip and unzip are inverses of each other as seconded by Jeff.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 29, 2017

Why do we like to return tuples so much? Why not arrays?

Type inference can reason about tuples and the types of individual slots in them. Inference can only reason about the element type of an entire array, which in this case is not nearly as helpful.

@StefanKarpinski
Copy link
Member

Thanks for taking a crack at this @miguelraz!

@miguelraz
Copy link
Contributor Author

@StefanKarpinski Thanks to you guys for helping me learn!
Any comments/criticism/suggestions/advice more than welcome.
I won't promise results, but I will try my best at it.

Type inference can reason about tuples and the types of individual slots in them

Aaaah. And inference leads to speed. No wonder Base likes Tuples so much.

# unzip
let z = zip(1:2)
@test unzip(z) == 1:2
@test eltype(unzip(z)) == UnitRange{Int64}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnitRange{Int} here since it will fail on systems with Int == Int32 (and same on the tests below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Corrected now.

@@ -240,6 +240,32 @@ end
iteratorsize{I1,I2}(::Type{Zip{I1,I2}}) = zip_iteratorsize(iteratorsize(I1),iteratorsize(I2))
iteratoreltype{I1,I2}(::Type{Zip{I1,I2}}) = and_iteratoreltype(iteratoreltype(I1),iteratoreltype(I2))

# unzip
```jldoctest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the code should be within ```jldoctest ... ``` (i.e. the parts with julia>)

Also, the complete docstring should be within """ ... """, take a look at the docstring for zip on the lines above in this file!

(Also note that the examples are outdated after you updated unzip to only work with AbstractZipIterators)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, tried to fix this up.
Review comments?

For iterable objects of the same length, returns iterable sets, where the ith set contains
the ith component of each input iterable object.

Note that the inverse of [`unzip`](@ref) is [`zip`](@ref).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps not necessary to reference unzip in its own documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But zip's docs say that zip's inverse is unzip.
(Or at least tried to indicate the inverse.)
... Decision time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant the @ref part of unzip here, no need to link to itself. (And the docs for zip probably shouldn't have a @ref for itself either.)

@martinholters
Copy link
Member

While I agree that an unzip specialized for zip output makes a lot of sense, I also think a more general unzip that can handle any iterator that delivers tuples of constant size would be quite useful. That has not to be part of this PR, I just wanted to mention it before someone closes #13942 after this has been merged.

@StefanKarpinski
Copy link
Member

I would imagine that calling unzip on a zip would generally only be done in tests. After all, if you just zipped something, why are you unzipping it again?

# This avoids exporting the method that isn't for AbstractZipIterators
_unzip(iter::Zip1) = (iter.a,)
_unzip(iter::Zip2) = (iter.a, iter.b)
_unzip(iter::Zip) = (iter.a, unzip(iter.z)...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner call to unzip here should actually be _unzip. (That was my typo initially, sorry about that.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Corrected now.

@@ -22,6 +22,22 @@ let z = zip(1:2, 3:4, 5:6)
@test eltype(z) == Tuple{Int,Int,Int}
end

# unzip
let z = zip(1:2)
@test unzip(z) == 1:2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since unzip always returns a Tuple, the right hand side of this test should be (1:2,)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Corrected as well.

@ararslan
Copy link
Member

After all, if you just zipped something, why are you unzipping it again?

That's a fair point, but if you haven't zipped something, why would you be unzipping it at all?

@dmbates
Copy link
Member

dmbates commented Mar 29, 2017 via email

@miguelraz
Copy link
Contributor Author

Work log:
Looked over reviews and pointers, tried to implement suggested fixes.
Thanks a zillion to everybody, this is a cool learning experience.
As always, any critiques/advice/suggestions/welcome.


```jldoctest
"""jldoctest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not quite right, try looking at enumerate on line 35.

@@ -22,6 +22,22 @@ let z = zip(1:2, 3:4, 5:6)
@test eltype(z) == Tuple{Int,Int,Int}
end

# unzip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a good place to use @testset?

@@ -240,6 +241,34 @@ end
iteratorsize{I1,I2}(::Type{Zip{I1,I2}}) = zip_iteratorsize(iteratorsize(I1),iteratorsize(I2))
iteratoreltype{I1,I2}(::Type{Zip{I1,I2}}) = and_iteratoreltype(iteratoreltype(I1),iteratoreltype(I2))

# unzip
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought that unzip would need it's own method name, something like unzip(iter::AbstractZipIterator). Also, I think that you need to add unzip to exports.jl.

@vtjnash
Copy link
Member

vtjnash commented Apr 8, 2021

Replaced by #33515 (@StefanKarpinski there's a couple tests here, if you don't want to write your own 😜)

@vtjnash vtjnash closed this Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs tests Unit tests are required for this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants