-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PyVortex -------- The generated documentation for this branch is available at https://spiraldb.github.io/vortex/docs/ The Python package is now structured like this: - `vortex` - `array()`: converts a list or an Arrow array into a Vortex array. - `encodings` - `Array`: In Rust this is called a PyArray and it is just PyO3 wrapper around a Vortex Rust Array. - `to_pandas` - `to_numpy` - `compress()`: compresses an Array. - `dtype`: A module containing dtype constructors, e.g. `uint(32, nullable=False)` - `io`: Readers and writers which currently only work for Struct arrays without top-level nulls. - `read()` - `write()` - `expr` - `Expr`: a class, implemented in Rust, which constructs vortex-exprs using the obvious Python operators. I also added `python_repr` which returns a Display-able struct that renders itself in the Python `repr` style. In particular, the dtypes look like `uint(32, False)` rather than `u32`. I think the only bugfixes in this PR are: 1. pyvortex/src/encode.rs: propagate the nullability from Arrow to `Array::from_arrow`. 2. arrow/recordbatch.rs and arrow/dtype.rs need to return compatible nullability and validity. Future Work ----------- 1. Automatically generate and deploy the documentation to github.io. 2. Run `cd pyvortex/docs && make doctest` on every commit.
- Loading branch information
Showing
26 changed files
with
1,493 additions
and
191 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Array Data Types | ||
================ | ||
|
||
.. automodule:: vortex.dtype | ||
:members: | ||
:imported-members: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Arrays | ||
====== | ||
|
||
.. automodule:: vortex.encoding | ||
:members: | ||
:imported-members: | ||
:special-members: __len__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Row Filter Expressions | ||
====================== | ||
|
||
.. automodule:: vortex.expr | ||
:members: | ||
:imported-members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Input and Output | ||
================ | ||
|
||
.. automodule:: vortex.io | ||
:members: | ||
:imported-members: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,9 @@ description = "Add your description here" | |
authors = [ | ||
{ name = "Nicholas Gates", email = "[email protected]" } | ||
] | ||
dependencies = [] | ||
dependencies = [ | ||
"pydata-sphinx-theme>=0.15.4", | ||
] | ||
requires-python = ">= 3.11" | ||
classifiers = ["Private :: Do Not Upload"] | ||
|
||
|
@@ -17,7 +19,10 @@ build-backend = "maturin" | |
managed = true | ||
dev-dependencies = [ | ||
"pyarrow>=15.0.0", | ||
"pip" | ||
"pip", | ||
"sphinx>=8.0.2", | ||
"ipython>=8.26.0", | ||
"pandas>=2.2.2", | ||
] | ||
|
||
[tool.maturin] | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,9 @@ | ||
from ._lib import * # noqa: F403 | ||
from . import encoding | ||
from ._lib import __doc__ as module_docs | ||
from ._lib import dtype, expr, io | ||
|
||
__doc__ = module_docs | ||
del module_docs | ||
array = encoding.array | ||
|
||
__all__ = ["array", dtype, expr, io, encoding] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
import pyarrow | ||
|
||
from ._lib import encoding as _encoding | ||
|
||
__doc__ = _encoding.__doc__ | ||
|
||
Array = _encoding.Array | ||
compress = _encoding.compress | ||
|
||
|
||
def _Array_to_pandas(self: _encoding.Array, *, name: str | None = None, flatten: bool = False): | ||
"""Construct a Pandas dataframe from this Vortex array. | ||
Parameters | ||
---------- | ||
obj : :class:`pyarrow.Array` or :class:`list` | ||
The elements of this array or list become the elements of the Vortex array. | ||
name : :class:`str`, optional | ||
The name of the column in the newly created dataframe. If unspecified, use `x`. | ||
flatten : :class:`bool` | ||
If :obj:`True`, Struct columns are flattened in the dataframe. See the examples. | ||
Returns | ||
------- | ||
:class:`pandas.DataFrame` | ||
Examples | ||
-------- | ||
Construct a :class:`.pandas.DataFrame` with one column named `animals` from the contents of a Vortex | ||
array: | ||
>>> array = vortex.encoding.array(['dog', 'cat', 'mouse', 'rat']) | ||
>>> array.to_pandas(name='animals') | ||
animals | ||
0 dog | ||
1 cat | ||
2 mouse | ||
3 rat | ||
Construct a :class:`.pandas.DataFrame` with the default column name: | ||
>>> array = vortex.encoding.array(['dog', 'cat', 'mouse', 'rat']) | ||
>>> array.to_pandas() | ||
x | ||
0 dog | ||
1 cat | ||
2 mouse | ||
3 rat | ||
Construct a dataframe with a Struct-typed column: | ||
>>> array = vortex.encoding.array([ | ||
... {'name': 'Joseph', 'age': 25}, | ||
... {'name': 'Narendra', 'age': 31}, | ||
... {'name': 'Angela', 'age': 33}, | ||
... {'name': 'Mikhail', 'age': 57}, | ||
... ]) | ||
>>> array.to_pandas() | ||
x | ||
0 {'age': 25, 'name': 'Joseph'} | ||
1 {'age': 31, 'name': 'Narendra'} | ||
2 {'age': 33, 'name': 'Angela'} | ||
3 {'age': 57, 'name': 'Mikhail'} | ||
Lift the struct fields to the top-level in the dataframe: | ||
>>> array.to_pandas(flatten=True) | ||
x.age x.name | ||
0 25 Joseph | ||
1 31 Narendra | ||
2 33 Angela | ||
3 57 Mikhail | ||
""" | ||
name = name or "x" | ||
table = pyarrow.Table.from_arrays([self.to_arrow()], [name]) | ||
if flatten: | ||
table = table.flatten() | ||
return table.to_pandas() | ||
|
||
|
||
Array.to_pandas = _Array_to_pandas | ||
|
||
|
||
def _Array_to_numpy(self: _encoding.Array, *, zero_copy_only: bool = True): | ||
"""Construct a NumPy array from this Vortex array. | ||
This is an alias for :code:`self.to_arrow().to_numpy(zero_copy_only)` | ||
Returns | ||
------- | ||
:class:`numpy.ndarray` | ||
Examples | ||
-------- | ||
Construct an ndarray from a Vortex array: | ||
>>> array = vortex.encoding.array([1, 0, 0, 1]) | ||
>>> array.to_numpy() | ||
array([1, 0, 0, 1]) | ||
""" | ||
return self.to_arrow().to_numpy(zero_copy_only=zero_copy_only) | ||
|
||
|
||
Array.to_numpy = _Array_to_numpy | ||
|
||
|
||
def array(obj: pyarrow.Array | list) -> Array: | ||
"""The main entry point for creating Vortex arrays from other Python objects. | ||
This function is also available as ``vortex.array``. | ||
Parameters | ||
---------- | ||
obj : :class:`pyarrow.Array` or :class:`list` | ||
The elements of this array or list become the elements of the Vortex array. | ||
Returns | ||
------- | ||
:class:`vortex.encoding.Array` | ||
Examples | ||
-------- | ||
A Vortex array containing the first three integers. | ||
>>> vortex.encoding.array([1, 2, 3]).to_arrow() | ||
<pyarrow.lib.Int64Array object at ...> | ||
[ | ||
1, | ||
2, | ||
3 | ||
] | ||
The same Vortex array with a null value in the third position. | ||
>>> vortex.encoding.array([1, 2, None, 3]).to_arrow() | ||
<pyarrow.lib.Int64Array object at ...> | ||
[ | ||
1, | ||
2, | ||
null, | ||
3 | ||
] | ||
Initialize a Vortex array from an Arrow array: | ||
>>> arrow = pyarrow.array(['Hello', 'it', 'is', 'me']) | ||
>>> vortex.encoding.array(arrow).to_arrow() | ||
<pyarrow.lib.StringArray object at ...> | ||
[ | ||
"Hello", | ||
"it", | ||
"is", | ||
"me" | ||
] | ||
""" | ||
if isinstance(obj, list): | ||
return _encoding._encode(pyarrow.array(obj)) | ||
return _encoding._encode(obj) |
Oops, something went wrong.