-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated README #876
updated README #876
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my general bias is against superlatives, but I didn’t comment on each one.
README.md
Outdated
Vortex is a toolkit for working with compressed Apache Arrow arrays in-memory, on-disk, and over-the-wire. | ||
|
||
Vortex is designed to be to columnar file formats what Apache DataFusion is to query engines (or, analogously, | ||
what LLVM + Clang are to compilers): a highly extensible & extremely performant *framework* for building a modern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer fast to performant personally.
README.md
Outdated
|
||
Vinyl is an aspiring successor to Apache Parquet, with dramatically faster random access reads (100-200x faster) | ||
and scans (2-10x faster), while preserving approximately the same compression ratio and write throughput. It will also support very wide | ||
tables (at least 10s of thousands of columns) and (in the future) on-device decompression on GPUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m hoping for millions; is that too speculative to include here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already lazy deserialize schema so might be worth trying to benchmark the overhead of arbirtrary number of columns vs read of selected few. I think layouts get bigger and we have to read them all.
README.md
Outdated
Optimized for random access reads and extremely fast scans; an aspiring successor to Apache Parquet. | ||
|
||
|
||
[1]: Because like its predecessor, it's a type of [flooring](https://en.wikipedia.org/wiki/Parquet), but it is much better for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
README.md
Outdated
Vortex is a toolkit for working with compressed Apache Arrow arrays in-memory, on-disk, and over-the-wire. | ||
|
||
Vortex is designed to be to columnar file formats what Apache DataFusion is to query engines (or, analogously, | ||
what LLVM + Clang are to compilers): a highly extensible & extremely performant *framework* for building a modern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang is the C++ compiler using LLVM. In this analogy it's just the LLVM
No description provided.