Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove GeometryArray enum #291

Closed
wants to merge 17 commits into from
Closed

Conversation

lewiszlw
Copy link
Contributor

@lewiszlw lewiszlw commented Dec 2, 2023

Will create follow up prs to rename GeometryArrayTrait and impl algos for dyn GeometryArrayTrait

@lewiszlw lewiszlw marked this pull request as draft December 4, 2023 01:47
@lewiszlw lewiszlw marked this pull request as ready for review December 5, 2023 07:53
@lewiszlw
Copy link
Contributor Author

lewiszlw commented Dec 5, 2023

@kylebarron Hi, Kyle, this pr bascally passes the build and test, I delete some code which should be reimplemented by dyn GeometryArrayTrait (I need more time to understand the code). Please give your review when you have time, especially on dyn GeometryArrayTrait.

@kylebarron
Copy link
Member

kylebarron commented Dec 6, 2023

Sorry I'm really busy right now with preparing for a conference next week, moving to a new apartment this week, and I'm also sick. I implemented #299 , which you'll probably be interested in, which does all the handling for converting a WKBArray to a Arc<dyn GeometryArrayTrait>, as long as the WKBArray doesn't have geometry collections (though there are probably still bugs somewhere in the MixedGeometryBuilder). So then implementing operations on Arc<dyn GeometryArrayTrait> means we should be able to do wkb_array.parse_to_geoarrow().unsigned_area() without necessarily implementing algorithms on the WKB Array itself.

I'll try to give you a closer review in the next couple days

@lewiszlw
Copy link
Contributor Author

lewiszlw commented Dec 6, 2023

Sorry, hope you are fine. wkb_array.parse_to_geoarrow() seems a good idea. I'll take a look.

use arrow_schema::Field;

impl dyn GeometryArrayTrait {
pub fn value<O: OffsetSizeTrait>(&self, i: usize) -> Geometry<O> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This O seems problematic, as it would require you to specify the offset type in advance, right? But you don't know the type in advance until you look at the data_type.

In general, I don't think this method should exist, or at least should hardly ever be used, because it would require downcasting on every element instead of once per array, which is contrary to my perf goals.

For algorithms, you should downcast once for the entire array to a strongly-typed array, and then not need dynamic dispatch for the iteration within the array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So impl algorithms for dyn GeometryArrayTrait like this?

impl BoundingRect for dyn GeometryArrayTrait {
    fn bounding_rect(&self) -> RectArray {
        match self.data_type() {
            GeoDataType::Point(_) => {
                let array = as_point_array(self);
                let output_geoms: Vec<Option<Rect>> = array
                    .iter_geo()
                    .map(|maybe_g| maybe_g.map(|geom| geom.bounding_rect()))
                    .collect();

                RectArray::from(output_geoms)
            }
            GeoDataType::LineString(_) => {
                let array = as_line_string_array::<i32>(self);
                let output_geoms: Vec<Option<Rect>> = array
                    .iter_geo()
                    .map(|maybe_g| maybe_g.and_then(|geom| geom.bounding_rect()))
                    .collect();

                RectArray::from(output_geoms)
            }
            _ => unimplemented!()
        }
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be able to extend the existing macro to handle this. I started some scratch work in #326.

Given that BoundingRect is implemented for the concrete arrays already, we don't need to repeat the implementation of casting to geo and calling its bounding_rect method.

That can simplify to

impl BoundingRect for dyn GeometryArrayTrait {
    fn bounding_rect(&self) -> RectArray {
        match self.data_type() {
            GeoDataType::Point(_) => as_point_array(self).bounding_rect(),
            GeoDataType::LineString(_) => as_line_string_array::<i32>(self).bounding_rect(),
            _ => unimplemented!()
        }
    }
}

Comment on lines +243 to +246
impl TryFrom<(&Field, &dyn Array, bool)> for Box<dyn GeometryArrayTrait> {
type Error = GeoArrowError;

fn try_from((field, array, is_large): (&Field, &dyn Array, bool)) -> Result<Self, Self::Error> {
Copy link
Member

@kylebarron kylebarron Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of From implementations on tuples like this, especially beyond two items. I find it very hard to read in practice. I think it would be clearer if it's refactored into a function.

is_large also shouldn't be necessary, as that can be inferred from the field (or the array).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_large also shouldn't be necessary, as that can be inferred from the field (or the array).

you mean this can be inferred by arrow datatype? e.g. LargeList vs List?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly

fn process_geometry_n<P: GeomProcessor>(
geometry_column: &GeometryArray<i32>,
geometry_column: &Box<dyn GeometryArrayTrait>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not &dyn GeometryArrayTrait?

Comment on lines 35 to 60
// TODO: just an example to show how to impl algorithms for dyn GeometryArrayTrait
impl BoundingRect for dyn GeometryArrayTrait {
fn bounding_rect(&self) -> RectArray {
match self.data_type() {
GeoDataType::Point(_) => {
let array = as_point_array(self);
let output_geoms: Vec<Option<Rect>> = array
.iter_geo()
.map(|maybe_g| maybe_g.map(|geom| geom.bounding_rect()))
.collect();

RectArray::from(output_geoms)
}
GeoDataType::LineString(_) => {
let array = as_line_string_array::<i32>(self);
let output_geoms: Vec<Option<Rect>> = array
.iter_geo()
.map(|maybe_g| maybe_g.and_then(|geom| geom.bounding_rect()))
.collect();

RectArray::from(output_geoms)
}
_ => unimplemented!()
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'll be able to use a similar macro as we already have here:

/// Implements the common pattern where a [`GeometryArray`][crate::array::GeometryArray] enum
/// simply delegates its trait impl to it's inner type.
///
// This is derived from geo https://github.com/georust/geo/blob/d4c858308ba910f69beab175e08af263b17c5f9f/geo/src/types.rs#L119-L158
#[macro_export]
macro_rules! geometry_array_delegate_impl {
($($a:tt)*) => { $crate::__geometry_array_delegate_impl_helper!{ GeometryArray, $($a)* } }
}
#[doc(hidden)]
#[macro_export]
macro_rules! __geometry_array_delegate_impl_helper {
(
$enum:ident,
$(
$(#[$outer:meta])*
fn $func_name: ident(&$($self_life:lifetime)?self $(, $arg_name: ident: $arg_type: ty)*) -> $return: ty;
)+
) => {
$(
$(#[$outer])*
fn $func_name(&$($self_life)? self, $($arg_name: $arg_type),*) -> $return {
match self {
$enum::Point(g) => g.$func_name($($arg_name),*).into(),
// $enum::Line(g) => g.$func_name($($arg_name),*).into(),
$enum::LineString(g) => g.$func_name($($arg_name),*).into(),
$enum::Polygon(g) => g.$func_name($($arg_name),*).into(),
$enum::MultiPoint(g) => g.$func_name($($arg_name),*).into(),
$enum::MultiLineString(g) => g.$func_name($($arg_name),*).into(),
$enum::MultiPolygon(g) => g.$func_name($($arg_name),*).into(),
// $enum::GeometryCollection(g) => g.$func_name($($arg_name),*).into(),
$enum::Rect(_g) => todo!(),
// $enum::Rect(g) => g.$func_name($($arg_name),*).into(),
// $enum::Triangle(g) => g.$func_name($($arg_name),*).into(),
}
}
)+
};
}

instead of copying the original function content for every branch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started scratch work of this in #326

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants