Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured ByteView Access (underlying StringView/BinaryView representation) #5736

Closed
tustvold opened this issue May 8, 2024 · 7 comments
Closed
Assignees
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@tustvold
Copy link
Contributor

tustvold commented May 8, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

There have been a couple of proposals to improve the ergonomics of interacting with byte views

The current design of the byte views assumes calling sites know the structure of the u128, #5735 tweaks this a little bit but the onus is still on the user to understand the layout of the views.

Describe the solution you'd like

TBD

Describe alternatives you've considered

Additional context

@tustvold tustvold added the enhancement Any new improvement worthy of a entry in the changelog label May 8, 2024
@alamb alamb changed the title Structured ByteView Access Structured ByteView Access (underlying StringView/BinaryView representation) May 13, 2024
@alamb
Copy link
Contributor

alamb commented May 13, 2024

Related discussion #5654 (comment)

I think this basically means the intention is that we (well @tustvold ) plans to replace u128 in ByteViewArray with an actual structured type of some sort

@alamb
Copy link
Contributor

alamb commented May 28, 2024

#5796 is up for feedback

@tustvold
Copy link
Contributor Author

tustvold commented May 30, 2024

#5786 provides a high-level interface for constructing views, which mirrors the high-level interface exposed by ByteViewArray for accessing them.

This means users shouldn't ever need to deal with the underlying representation. I think we therefore could just close this ticket.

For completeness I have spent some time trying to come up with a better abstraction than u128 and have struggled.

There are a couple of major challenges:

  • String prefixes are not necessarily valid unicode
  • Certain operations exploit the layout in ways LLVM seems unable to come up with, e.g.
    • Compare the length and prefix in a single u64 comparison
    • Elide length checking once checked length and prefix together

That isn't to say a better abstraction isn't possible, but it is fiddly and the major forcing function of construction is resolved by #5786. I think a smaller incremental PR like #5735 may therefore be all we need for the time being. We can always revisit in 3 months time when we look to do the next breaking release.

@alamb
Copy link
Contributor

alamb commented May 30, 2024

I like the plan of going with what we have (which is now a bit nicer) and waiting another three months for any breaking change

We can always add things on top of the u128s (e.g. similar to #5619) if we need something more ergonomic

@alamb
Copy link
Contributor

alamb commented Jun 11, 2024

@tustvold do you have any more plans for this issue in the next week? I think @XiangpengHao is going to be helping work on ByteView features over the next few days but also that that shouldn't really affect this ticket

@tustvold
Copy link
Contributor Author

I do not intend to do anything further with this ticket at this time, no

@alamb alamb added the arrow Changes to the arrow crate label Jul 2, 2024
@alamb
Copy link
Contributor

alamb commented Jul 2, 2024

label_issue.py automatically added labels {'arrow'} from #5619

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

2 participants