Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyVortex #729

Merged
merged 12 commits into from
Sep 5, 2024
Merged

PyVortex #729

merged 12 commits into from
Sep 5, 2024

Conversation

danking
Copy link
Member

@danking danking commented Sep 4, 2024

PyVortex

The generated documentation for this branch is available at https://spiraldb.github.io/vortex/docs/

The Python package is now structured like this:

  • vortex
    • array(): converts a list or an Arrow array into a Vortex array.
    • encodings
      • Array: In Rust this is called a PyArray and it is just PyO3 wrapper around a Vortex Rust Array. - to_pandas - to_numpy
      • compress(): compresses an Array.
    • dtype: A module containing dtype constructors, e.g. uint(32, nullable=False)
    • io: Readers and writers which currently only work for Struct arrays without top-level nulls.
      • read()
      • write()
    • expr - Expr: a class, implemented in Rust, which constructs vortex-exprs using the obvious Python operators.

I also added python_repr which returns a Display-able struct that renders itself in the Python repr style. In particular, the dtypes look like uint(32, False) rather than u32.

I think the only bugfixes in this PR are:

  1. pyvortex/src/encode.rs: propagate the nullability from Arrow to Array::from_arrow.
  2. arrow/recordbatch.rs and arrow/dtype.rs need to return compatible nullability and validity.

Future Work

  1. Automatically generate and deploy the documentation to github.io.
  2. Run cd pyvortex/docs && make doctest on every commit.

@danking danking requested a review from robert3005 September 4, 2024 19:38
@danking danking force-pushed the dk/python-documentation-struct-read-projection branch from 67a213d to 4d55c61 Compare September 4, 2024 19:40
Copy link
Member

@robert3005 robert3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can move some things around. Creating tokio runtime is a bit annoying but it's a future problem to solve

vortex-dtype/src/dtype.rs Outdated Show resolved Hide resolved
vortex-dtype/src/extension.rs Outdated Show resolved Hide resolved
pyvortex/src/dtype.rs Outdated Show resolved Hide resolved
pyvortex/src/io.rs Outdated Show resolved Hide resolved
pyvortex/src/io.rs Outdated Show resolved Hide resolved
pyvortex/src/array.rs Outdated Show resolved Hide resolved
@danking danking force-pushed the dk/python-documentation-struct-read-projection branch 5 times, most recently from 3f31390 to 2bfb65b Compare September 5, 2024 15:05
PyVortex
--------

The generated documentation for this branch is available at https://spiraldb.github.io/vortex/docs/

The Python package is now structured like this:

- `vortex`
  - `array()`: converts a list or an Arrow array into a Vortex array.
  - `encodings`
    - `Array`: In Rust this is called a PyArray and it is just PyO3 wrapper around a Vortex Rust Array.
      - `to_pandas`
      - `to_numpy`
    - `compress()`: compresses an Array.
  - `dtype`: A module containing dtype constructors, e.g. `uint(32, nullable=False)`
  - `io`: Readers and writers which currently only work for Struct arrays without top-level nulls.
    - `read()`
    - `write()`
  - `expr`
    - `Expr`: a class, implemented in Rust, which constructs vortex-exprs using the obvious Python operators.

I also added `python_repr` which returns a Display-able struct that renders itself in the Python
`repr` style. In particular, the dtypes look like `uint(32, False)` rather than `u32`.

I think the only bugfixes in this PR are:

1. pyvortex/src/encode.rs: propagate the nullability from Arrow to `Array::from_arrow`.
2. arrow/recordbatch.rs and arrow/dtype.rs need to return compatible nullability and validity.

Future Work
-----------

1. Automatically generate and deploy the documentation to github.io.
2. Run `cd pyvortex/docs && make doctest` on every commit.
@danking danking force-pushed the dk/python-documentation-struct-read-projection branch from f1bd829 to 52851b3 Compare September 5, 2024 16:43
@danking danking requested a review from robert3005 September 5, 2024 16:46
@@ -26,8 +31,12 @@ impl PyDType {
format!("{}", self.inner)
}

fn __repr__(&self) -> String {
format!("{}", self.inner.python_repr())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be

Suggested change
format!("{}", self.inner.python_repr())
self.inner.python_repr().to_string()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@danking danking enabled auto-merge (squash) September 5, 2024 17:48
@danking danking merged commit e3a6c5a into develop Sep 5, 2024
4 checks passed
@danking danking deleted the dk/python-documentation-struct-read-projection branch September 5, 2024 18:01
@danking danking mentioned this pull request Sep 5, 2024
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants