Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyVortex #729

Merged
merged 12 commits into from
Sep 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions pyvortex/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,27 @@ doctest = false

[dependencies]
arrow = { workspace = true, features = ["pyarrow"] }
flexbuffers = { workspace = true }
futures = { workspace = true }
log = { workspace = true }
paste = { workspace = true }
pyo3 = { workspace = true }
pyo3-log = { workspace = true }
tokio = { workspace = true, features = ["fs"] }
vortex-alp = { workspace = true }
vortex-array = { workspace = true }
vortex-dict = { workspace = true }
vortex-dtype = { workspace = true }
vortex-error = { workspace = true }
vortex-expr = { workspace = true }
vortex-fastlanes = { workspace = true }
vortex-roaring = { workspace = true }
vortex-runend = { workspace = true }
vortex-sampling-compressor = { workspace = true }
vortex-serde = { workspace = true, features = ["tokio"] }
vortex-scalar = { workspace = true }
vortex-zigzag = { workspace = true }
itertools = { workspace = true }

# We may need this workaround?
# https://pyo3.rs/v0.20.2/faq.html#i-cant-run-cargo-test-or-i-cant-build-in-a-cargo-workspace-im-having-linker-issues-like-symbol-not-found-or-undefined-reference-to-_pyexc_systemerror
7 changes: 7 additions & 0 deletions pyvortex/docs/dtype.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Array Data Types
================

.. automodule:: vortex.dtype
:members:
:imported-members:

7 changes: 7 additions & 0 deletions pyvortex/docs/encoding.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Arrays
======

.. automodule:: vortex.encoding
:members:
:imported-members:
:special-members: __len__
6 changes: 6 additions & 0 deletions pyvortex/docs/expr.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Row Filter Expressions
======================

.. automodule:: vortex.expr
:members:
:imported-members:
8 changes: 6 additions & 2 deletions pyvortex/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,13 @@
Vortex documentation
====================

.. automodule:: vortex
:members:
Vortex is an Apache Arrow-compatible toolkit for working with compressed array data.

.. toctree::
:maxdepth: 2
:caption: Contents:

encoding
dtype
io
expr
6 changes: 6 additions & 0 deletions pyvortex/docs/io.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Input and Output
================

.. automodule:: vortex.io
:members:
:imported-members:
9 changes: 7 additions & 2 deletions pyvortex/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ description = "Add your description here"
authors = [
{ name = "Nicholas Gates", email = "[email protected]" }
]
dependencies = []
dependencies = [
"pydata-sphinx-theme>=0.15.4",
]
requires-python = ">= 3.11"
classifiers = ["Private :: Do Not Upload"]

Expand All @@ -17,7 +19,10 @@ build-backend = "maturin"
managed = true
dev-dependencies = [
"pyarrow>=15.0.0",
"pip"
"pip",
"sphinx>=8.0.2",
"ipython>=8.26.0",
"pandas>=2.2.2",
]

[tool.maturin]
Expand Down
7 changes: 6 additions & 1 deletion pyvortex/python/vortex/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
from ._lib import * # noqa: F403
from . import encoding
from ._lib import __doc__ as module_docs
from ._lib import dtype, expr, io

__doc__ = module_docs
del module_docs
array = encoding.array

__all__ = ["array", dtype, expr, io, encoding]
166 changes: 166 additions & 0 deletions pyvortex/python/vortex/encoding.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
import pyarrow

from ._lib import encoding as _encoding

__doc__ = _encoding.__doc__

Array = _encoding.Array
compress = _encoding.compress


def _Array_to_pandas(self: _encoding.Array, *, name: str | None = None, flatten: bool = False):
"""Construct a Pandas dataframe from this Vortex array.

Parameters
----------
obj : :class:`pyarrow.Array` or :class:`list`
The elements of this array or list become the elements of the Vortex array.

name : :class:`str`, optional
The name of the column in the newly created dataframe. If unspecified, use `x`.

flatten : :class:`bool`
If :obj:`True`, Struct columns are flattened in the dataframe. See the examples.

Returns
-------
:class:`pandas.DataFrame`

Examples
--------

Construct a :class:`.pandas.DataFrame` with one column named `animals` from the contents of a Vortex
array:

>>> array = vortex.encoding.array(['dog', 'cat', 'mouse', 'rat'])
>>> array.to_pandas(name='animals')
animals
0 dog
1 cat
2 mouse
3 rat

Construct a :class:`.pandas.DataFrame` with the default column name:

>>> array = vortex.encoding.array(['dog', 'cat', 'mouse', 'rat'])
>>> array.to_pandas()
x
0 dog
1 cat
2 mouse
3 rat

Construct a dataframe with a Struct-typed column:

>>> array = vortex.encoding.array([
... {'name': 'Joseph', 'age': 25},
... {'name': 'Narendra', 'age': 31},
... {'name': 'Angela', 'age': 33},
... {'name': 'Mikhail', 'age': 57},
... ])
>>> array.to_pandas()
x
0 {'age': 25, 'name': 'Joseph'}
1 {'age': 31, 'name': 'Narendra'}
2 {'age': 33, 'name': 'Angela'}
3 {'age': 57, 'name': 'Mikhail'}

Lift the struct fields to the top-level in the dataframe:

>>> array.to_pandas(flatten=True)
x.age x.name
0 25 Joseph
1 31 Narendra
2 33 Angela
3 57 Mikhail

"""
name = name or "x"
table = pyarrow.Table.from_arrays([self.to_arrow()], [name])
if flatten:
table = table.flatten()
return table.to_pandas()


Array.to_pandas = _Array_to_pandas


def _Array_to_numpy(self: _encoding.Array, *, zero_copy_only: bool = True):
"""Construct a NumPy array from this Vortex array.

This is an alias for :code:`self.to_arrow().to_numpy(zero_copy_only)`

Returns
-------
:class:`numpy.ndarray`

Examples
--------

Construct an ndarray from a Vortex array:

>>> array = vortex.encoding.array([1, 0, 0, 1])
>>> array.to_numpy()
array([1, 0, 0, 1])

"""
return self.to_arrow().to_numpy(zero_copy_only=zero_copy_only)


Array.to_numpy = _Array_to_numpy


def array(obj: pyarrow.Array | list) -> Array:
"""The main entry point for creating Vortex arrays from other Python objects.

This function is also available as ``vortex.array``.

Parameters
----------
obj : :class:`pyarrow.Array` or :class:`list`
The elements of this array or list become the elements of the Vortex array.

Returns
-------
:class:`vortex.encoding.Array`

Examples
--------

A Vortex array containing the first three integers.

>>> vortex.encoding.array([1, 2, 3]).to_arrow()
<pyarrow.lib.Int64Array object at ...>
[
1,
2,
3
]

The same Vortex array with a null value in the third position.

>>> vortex.encoding.array([1, 2, None, 3]).to_arrow()
<pyarrow.lib.Int64Array object at ...>
[
1,
2,
null,
3
]

Initialize a Vortex array from an Arrow array:

>>> arrow = pyarrow.array(['Hello', 'it', 'is', 'me'])
>>> vortex.encoding.array(arrow).to_arrow()
<pyarrow.lib.StringArray object at ...>
[
"Hello",
"it",
"is",
"me"
]

"""
if isinstance(obj, list):
return _encoding._encode(pyarrow.array(obj))
return _encoding._encode(obj)
Loading
Loading