Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: teach PyArray to compare #1090

Merged
merged 5 commits into from
Oct 21, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 67 additions & 2 deletions pyvortex/src/array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ use arrow::array::{Array as ArrowArray, ArrayRef};
use arrow::pyarrow::ToPyArrow;
use pyo3::exceptions::PyValueError;
use pyo3::prelude::*;
use pyo3::types::{IntoPyDict, PyList};
use pyo3::types::{IntoPyDict, PyList, PyString};
use vortex::array::ChunkedArray;
use vortex::compute::{slice, take};
use vortex::compute::{compare, slice, take, Operator};
use vortex::{Array, ArrayDType, IntoCanonical};

use crate::dtype::PyDType;
Expand Down Expand Up @@ -138,6 +138,71 @@ impl PyArray {
PyDType::wrap(self_.py(), self_.inner.dtype().clone())
}

/// Point-wise compare the elements of this array to another array.
///
/// Parameters
/// ----------
/// other : :class:`vortex.encoding.Array`
/// An array with whom to compare elements.
///
/// operator : :class:`str`
/// One of `eq`, `ne`, `gt`, `ge`, `lt`, or `le` indicating which binary comparison operator
/// to apply.
///
/// Returns
/// -------
/// :class:`vortex.encoding.Array`
///
/// Examples
/// --------
///
/// Compare an array of strings to itself:
///
/// >>> a = vortex.encoding.array(['a', 'b', 'c', 'd'])
/// >>> a.compare(a, "eq").to_arrow_array()
/// <pyarrow.lib.BooleanArray object at ...>
/// [
/// true,
/// true,
/// true,
/// true
/// ]
///
/// Compare two arrays containing nulls:
///
/// >>> a = vortex.encoding.array(['dog', None, 'cat', 'mouse', 'fish'])
/// >>> b = vortex.encoding.array(['doug', 'jennifer', 'casper', 'mouse', 'faust'])
/// >>> a.compare(b, 'lt').to_arrow_array()
/// <pyarrow.lib.BooleanArray object at ...>
/// [
/// true,
/// null,
/// false,
/// false,
/// false
/// ]
fn compare(&self, other: &Bound<PyArray>, operator: &Bound<PyString>) -> PyResult<PyArray> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very weird python api, would this be better as a individual funtions instead of string switch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it is weird. PyArrow doesn't implement lt and friends but, AFAICT, nobody has complained on their issues about it. They do implement __eq__. In the compute functions API, they do provide greater and friends.

My best guess is that those operations are allocating so they make them available through an API that exposes the memory pool parameter. It seems like that's a way to use jemalloc or mimalloc for Arrow arrays even if the rest of the application uses some other allocator. It seems like that could be used to create an Arena for arrow arrays and do O(1) free after, say, processing a chunk of data.

Anyway, I think you're right that we should implement the operators. Even if we later provide a way to specify an allocator, the operators can continue to use the default allocator.

Copy link
Member

@robert3005 robert3005 Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly even thinking that even if it's not the __eq__ or __compare__ I think it's better to get rid of stringly typed operator. FWIW the C api takes an int for operator

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to operators and moved the docs into the examples for technical reasons (see comment).

let other = other.borrow();
let operator = match operator.extract()? {
"eq" => Operator::Eq,
"ne" => Operator::NotEq,
"gt" => Operator::Gt,
"ge" => Operator::Gte,
"lt" => Operator::Lt,
"le" => Operator::Lte,
op => {
return Err(PyValueError::new_err(format!(
"expected eq, ne, gt, ge, lt, or le: {}",
op
)))
}
};

compare(&self.inner, &other.inner, operator)
.map(|arr| PyArray { inner: arr })
.map_err(PyVortexError::map_err)
}

/// Filter, permute, and/or repeat elements by their index.
///
/// Parameters
Expand Down
Loading