Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix WKB writing on big-endian systems #497

Merged
merged 1 commit into from
Dec 5, 2024
Merged

Conversation

QuLogic
Copy link
Contributor

@QuLogic QuLogic commented Nov 17, 2024

shapely.to_wkb (called in pyogrio.geopandas.write_dataframe via geopandas.array.to_wkb) defaults to machine byte order, so simply reading the first byte (of a 4-byte integer geometry type) is broken on big-endian machines. And since some geometry type could be 2 bytes, this also seems broken on little-endian machines, though I did not check if any 2-byte types could be produced.

The commented-out code here indicates the correct course of action (reading the first byte to determine byte order).

This fixes the 190 tests that go through this function and fail on big-endian.

@QuLogic
Copy link
Contributor Author

QuLogic commented Nov 17, 2024

Note, if you'd prefer to not use the Python int.from_bytes, I could write it with bytes and bitshifting. This usually compiles to a single instruction with optimizing compilers (though I'm unsure about the result through Cython.)

@QuLogic
Copy link
Contributor Author

QuLogic commented Nov 17, 2024

Note, if you'd prefer to not use the Python int.from_bytes, I could write it with bytes and bitshifting. This usually compiles to a single instruction with optimizing compilers (though I'm unsure about the result through Cython.)

Since Windows didn't like the int.from_bytes, I decided to implement this. (Probably it just needed the explicit type, but I can't check here.) I also changed wkb_buffer to const as it doesn't need to be written to, and potentially saves a copy using it directly.

Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @QuLogic ! Can you please add an entry to the changelog?

Do you have suggestions on how we can test for this in CI? I'm not seeing a straightforward way to do this with Github Actions, and it seems like it would require building the full stack GDAL, GEOS, Shapely, etc for something like s390x.

@QuLogic
Copy link
Contributor Author

QuLogic commented Nov 19, 2024

Do you have suggestions on how we can test for this in CI? I'm not seeing a straightforward way to do this with Github Actions, and it seems like it would require building the full stack GDAL, GEOS, Shapely, etc for something like s390x.

There's definitely no native support, but you could run it in a container (as was done for ARM previously). I would use system packages and you would get all of them pre-compiled (at the expense of not always being the latest). I'm sure Debian exists for s390x, or Fedora as I'm using.

pyogrio/_io.pyx Outdated Show resolved Hide resolved
@QuLogic QuLogic force-pushed the fix-be branch 2 times, most recently from a8a3f73 to 58d437e Compare November 27, 2024 02:50
@rouault
Copy link
Contributor

rouault commented Nov 27, 2024

Do you have suggestions on how we can test for this in CI?

cf OSGeo/gdal#11375 for an example how to do that for C/C++ software

@brendan-ward
Copy link
Member

@QuLogic would you mind please merging in the latest from main? We recently fixed the outstanding CI issues. Once everything shows up green here, I think we can go ahead and merge this, and save implementing specific CI tests for big endian systems for another PR.

`shapely.to_wkb` (called in `pyogrio.geopandas.write_dataframe` via
`geopandas.array.to_wkb`) defaults to machine byte order, so simply
reading the first byte (of a 4-byte integer geometry type) is broken on
big-endian machines. And since some geometry type could be 2 bytes, this
also seems broken on little-endian machines, though I did not check if
any 2-byte types could be produced.

The commented-out code here indicates the correct course of action
(reading the first byte to determine byte order). Instead of using
`int.from_bytes`, I wrote it out as bitshifting operations, which
compile directly to 1 or 2 (if byte swapping) assembly instructions.

Also, change `wkb_buffer` to be `const`, as it doesn't need to be
writeable (and perhaps this will save a copy.)
@QuLogic
Copy link
Contributor Author

QuLogic commented Dec 5, 2024

Looks good now.

Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @QuLogic !

@brendan-ward brendan-ward merged commit 57d9346 into geopandas:main Dec 5, 2024
22 checks passed
@QuLogic QuLogic deleted the fix-be branch December 5, 2024 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants