Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(geospatial): use geoarrow extension types when returning geometry columns as pyarrow #9549

Merged
merged 17 commits into from
Jul 14, 2024

Conversation

paleolimbot
Copy link
Contributor

Description of changes

Currently, GeoSpatial types are returned to pyarrow as pa.binary(), which drops the SRID and geometry/geography-ness of the column. Returning these columns marked as a GeoSpatial type allows other tools (e.g., lonboard for visualization) to pick this up.

I haven't kicked the tires on this at all yet and I almost certainly haven't thought through the implications here!

Copy link
Contributor

ACTION NEEDED

Ibis follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message.

Please update your PR title and description to match the specification.

@paleolimbot
Copy link
Contributor Author

cc @kylebarron @ncclementi

@kylebarron
Copy link

This looks awesome! Along with #9143, with this PR you should be able to pass an ibis.Table directly into lonboard.viz, because the viz function will see that an __arrow_c_stream__ dunder exists, then be able to interpret the data directly.

Comment on lines 202 to 203
# Warn for dropped CRS? Or geoarrow.types would need a lookup table
# for srid -> PROJJSON
crs = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is pyproj a dependency? Is it safe to assume that an srid refers to an EPSG code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it comes along, when we install pip install 'ibis-framework[duckdb, geospatial]'.

Installing collected packages: pytz, tzdata, typing-extensions, toolz, sqlglot, six, pygments, 
pyarrow-hotfix, parsy,  packaging, numpy, mdurl, duckdb, click, certifi, bidict, attrs, atpublic,
shapely, python-dateutil, pyproj, pyarrow, markdown-it-py, cligj, click-plugins, rich, pandas,
 fiona, ibis-framework, geopandas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Is it safe to assume that an srid refers to an EPSG code?

It depends where it comes from...a user typing it probably meant EPSG code, but otherwise it would be backend-specific (e.g., queryable using https://postgis.net/docs/using_postgis_dbmanagement.html#spatial_ref_sys_table ). It is usually an EPSG code in most practical use-cases (but could be a user-defined ID).

I'll workshop this a little and see what we can do!

@cpcloud cpcloud changed the title feat: Use GeoArrow extension types when returning geometry columns as pyarrow feat(geospatial): use geoarrow extension types when returning geometry columns as pyarrow Jul 12, 2024
ibis/formats/pyarrow.py Outdated Show resolved Hide resolved
@cpcloud
Copy link
Member

cpcloud commented Jul 13, 2024

@paleolimbot Let us know when you're ready for a review!

@gforsyth gforsyth force-pushed the geospatial-use-geoarrow-maybe branch from 3ba384d to 7d79072 Compare July 13, 2024 17:29
@cpcloud cpcloud added this to the 9.2 milestone Jul 13, 2024
@cpcloud cpcloud added geospatial Geospatial related functionality feature Features or general enhancements labels Jul 13, 2024
@paleolimbot paleolimbot marked this pull request as ready for review July 14, 2024 03:00
@paleolimbot
Copy link
Contributor Author

Let us know when you're ready for a review!

I think it's ready!

We should probably just turn srid into a string and avoid having to do this munging/conversion.

I think that would be better in the long run and would perhaps remove the pieces here where geo-specific details creep in. I think for this PR keeping the srid is fine and maybe less disruptive.

Is pyproj a dependency? Is it safe to assume that an srid refers to an EPSG code?

I've made this assumption for now. I think user-defined CRSes generally have very large srids and an authority/code like USER:90000383. It's been a while since I ran into this (reading Canadian ice cover data from shapefiles that defined a custom lambert conformal conic optimized for the Canadian arctic).

pyproject.toml Outdated Show resolved Hide resolved
@cpcloud cpcloud force-pushed the geospatial-use-geoarrow-maybe branch from cbda02e to 3749b22 Compare July 14, 2024 13:07
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@@ -64,6 +64,8 @@ db-dtypes = { version = ">=0.3,<2", optional = true }
deltalake = { version = ">=0.9.0,<1", optional = true }
duckdb = { version = ">=0.8.1,<2", optional = true }
geopandas = { version = ">=0.6,<2", optional = true }
geoarrow-types = { version = ">=0.2,<1", optional = true }
pyproj = { version = ">=3.3.0,<4", optional = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, we are making pyproj an explicit dependency. I know it comes when installing geopandas, which is already a dependency. I guess explicit better than implicit in this case.

Copy link
Member

@cpcloud cpcloud Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's generally a bad idea to assume that a package will forever and always include a dependency that you require. If there's a top-level import for a package, it needs to be included as an explicit dependency.

@cpcloud cpcloud merged commit cba7367 into ibis-project:main Jul 14, 2024
83 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements geospatial Geospatial related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants