Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated README #876

Merged
merged 6 commits into from
Sep 23, 2024
Merged

updated README #876

merged 6 commits into from
Sep 23, 2024

Conversation

lwwmanning
Copy link
Member

No description provided.

Copy link
Member

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my general bias is against superlatives, but I didn’t comment on each one.

README.md Show resolved Hide resolved
README.md Outdated
Vortex is a toolkit for working with compressed Apache Arrow arrays in-memory, on-disk, and over-the-wire.

Vortex is designed to be to columnar file formats what Apache DataFusion is to query engines (or, analogously,
what LLVM + Clang are to compilers): a highly extensible & extremely performant *framework* for building a modern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer fast to performant personally.

README.md Outdated

Vinyl is an aspiring successor to Apache Parquet, with dramatically faster random access reads (100-200x faster)
and scans (2-10x faster), while preserving approximately the same compression ratio and write throughput. It will also support very wide
tables (at least 10s of thousands of columns) and (in the future) on-device decompression on GPUs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m hoping for millions; is that too speculative to include here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already lazy deserialize schema so might be worth trying to benchmark the overhead of arbirtrary number of columns vs read of selected few. I think layouts get bigger and we have to read them all.

README.md Show resolved Hide resolved
README.md Outdated
Optimized for random access reads and extremely fast scans; an aspiring successor to Apache Parquet.


[1]: Because like its predecessor, it's a type of [flooring](https://en.wikipedia.org/wiki/Parquet), but it is much better for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

README.md Outdated Show resolved Hide resolved
README.md Outdated
Vortex is a toolkit for working with compressed Apache Arrow arrays in-memory, on-disk, and over-the-wire.

Vortex is designed to be to columnar file formats what Apache DataFusion is to query engines (or, analogously,
what LLVM + Clang are to compilers): a highly extensible & extremely performant *framework* for building a modern
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang is the C++ compiler using LLVM. In this analogy it's just the LLVM

@lwwmanning lwwmanning enabled auto-merge (squash) September 23, 2024 12:28
@lwwmanning lwwmanning disabled auto-merge September 23, 2024 12:29
@lwwmanning lwwmanning merged commit bf82687 into develop Sep 23, 2024
5 checks passed
@lwwmanning lwwmanning deleted the wm/readme branch September 23, 2024 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants