-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove GeometryArray enum #291
Conversation
@kylebarron Hi, Kyle, this pr bascally passes the build and test, I delete some code which should be reimplemented by |
Sorry I'm really busy right now with preparing for a conference next week, moving to a new apartment this week, and I'm also sick. I implemented #299 , which you'll probably be interested in, which does all the handling for converting a I'll try to give you a closer review in the next couple days |
Sorry, hope you are fine. |
use arrow_schema::Field; | ||
|
||
impl dyn GeometryArrayTrait { | ||
pub fn value<O: OffsetSizeTrait>(&self, i: usize) -> Geometry<O> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This O
seems problematic, as it would require you to specify the offset type in advance, right? But you don't know the type in advance until you look at the data_type
.
In general, I don't think this method should exist, or at least should hardly ever be used, because it would require downcasting on every element instead of once per array, which is contrary to my perf goals.
For algorithms, you should downcast once for the entire array to a strongly-typed array, and then not need dynamic dispatch for the iteration within the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So impl algorithms for dyn GeometryArrayTrait
like this?
impl BoundingRect for dyn GeometryArrayTrait {
fn bounding_rect(&self) -> RectArray {
match self.data_type() {
GeoDataType::Point(_) => {
let array = as_point_array(self);
let output_geoms: Vec<Option<Rect>> = array
.iter_geo()
.map(|maybe_g| maybe_g.map(|geom| geom.bounding_rect()))
.collect();
RectArray::from(output_geoms)
}
GeoDataType::LineString(_) => {
let array = as_line_string_array::<i32>(self);
let output_geoms: Vec<Option<Rect>> = array
.iter_geo()
.map(|maybe_g| maybe_g.and_then(|geom| geom.bounding_rect()))
.collect();
RectArray::from(output_geoms)
}
_ => unimplemented!()
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be able to extend the existing macro to handle this. I started some scratch work in #326.
Given that BoundingRect
is implemented for the concrete arrays already, we don't need to repeat the implementation of casting to geo and calling its bounding_rect
method.
That can simplify to
impl BoundingRect for dyn GeometryArrayTrait {
fn bounding_rect(&self) -> RectArray {
match self.data_type() {
GeoDataType::Point(_) => as_point_array(self).bounding_rect(),
GeoDataType::LineString(_) => as_line_string_array::<i32>(self).bounding_rect(),
_ => unimplemented!()
}
}
}
impl TryFrom<(&Field, &dyn Array, bool)> for Box<dyn GeometryArrayTrait> { | ||
type Error = GeoArrowError; | ||
|
||
fn try_from((field, array, is_large): (&Field, &dyn Array, bool)) -> Result<Self, Self::Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of From
implementations on tuples like this, especially beyond two items. I find it very hard to read in practice. I think it would be clearer if it's refactored into a function.
is_large
also shouldn't be necessary, as that can be inferred from the field
(or the array
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_large also shouldn't be necessary, as that can be inferred from the field (or the array).
you mean this can be inferred by arrow datatype? e.g. LargeList vs List?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly
fn process_geometry_n<P: GeomProcessor>( | ||
geometry_column: &GeometryArray<i32>, | ||
geometry_column: &Box<dyn GeometryArrayTrait>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not &dyn GeometryArrayTrait
?
src/algorithm/geo/bounding_rect.rs
Outdated
// TODO: just an example to show how to impl algorithms for dyn GeometryArrayTrait | ||
impl BoundingRect for dyn GeometryArrayTrait { | ||
fn bounding_rect(&self) -> RectArray { | ||
match self.data_type() { | ||
GeoDataType::Point(_) => { | ||
let array = as_point_array(self); | ||
let output_geoms: Vec<Option<Rect>> = array | ||
.iter_geo() | ||
.map(|maybe_g| maybe_g.map(|geom| geom.bounding_rect())) | ||
.collect(); | ||
|
||
RectArray::from(output_geoms) | ||
} | ||
GeoDataType::LineString(_) => { | ||
let array = as_line_string_array::<i32>(self); | ||
let output_geoms: Vec<Option<Rect>> = array | ||
.iter_geo() | ||
.map(|maybe_g| maybe_g.and_then(|geom| geom.bounding_rect())) | ||
.collect(); | ||
|
||
RectArray::from(output_geoms) | ||
} | ||
_ => unimplemented!() | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'll be able to use a similar macro as we already have here:
geoarrow-rs/src/algorithm/geo/utils.rs
Lines 8 to 47 in 906b5d5
/// Implements the common pattern where a [`GeometryArray`][crate::array::GeometryArray] enum | |
/// simply delegates its trait impl to it's inner type. | |
/// | |
// This is derived from geo https://github.com/georust/geo/blob/d4c858308ba910f69beab175e08af263b17c5f9f/geo/src/types.rs#L119-L158 | |
#[macro_export] | |
macro_rules! geometry_array_delegate_impl { | |
($($a:tt)*) => { $crate::__geometry_array_delegate_impl_helper!{ GeometryArray, $($a)* } } | |
} | |
#[doc(hidden)] | |
#[macro_export] | |
macro_rules! __geometry_array_delegate_impl_helper { | |
( | |
$enum:ident, | |
$( | |
$(#[$outer:meta])* | |
fn $func_name: ident(&$($self_life:lifetime)?self $(, $arg_name: ident: $arg_type: ty)*) -> $return: ty; | |
)+ | |
) => { | |
$( | |
$(#[$outer])* | |
fn $func_name(&$($self_life)? self, $($arg_name: $arg_type),*) -> $return { | |
match self { | |
$enum::Point(g) => g.$func_name($($arg_name),*).into(), | |
// $enum::Line(g) => g.$func_name($($arg_name),*).into(), | |
$enum::LineString(g) => g.$func_name($($arg_name),*).into(), | |
$enum::Polygon(g) => g.$func_name($($arg_name),*).into(), | |
$enum::MultiPoint(g) => g.$func_name($($arg_name),*).into(), | |
$enum::MultiLineString(g) => g.$func_name($($arg_name),*).into(), | |
$enum::MultiPolygon(g) => g.$func_name($($arg_name),*).into(), | |
// $enum::GeometryCollection(g) => g.$func_name($($arg_name),*).into(), | |
$enum::Rect(_g) => todo!(), | |
// $enum::Rect(g) => g.$func_name($($arg_name),*).into(), | |
// $enum::Triangle(g) => g.$func_name($($arg_name),*).into(), | |
} | |
} | |
)+ | |
}; | |
} |
instead of copying the original function content for every branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started scratch work of this in #326
Will create follow up prs to rename GeometryArrayTrait and impl algos for
dyn GeometryArrayTrait