Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: add geopackage support to matsim gis #3143

Merged
merged 6 commits into from
Mar 7, 2024

Conversation

nkuehnel
Copy link
Member

@nkuehnel nkuehnel commented Mar 6, 2024

This PR proposes an update to the gis utils package of matsim to not only support .shp files for reading and writing but also the inofficial successor Geopackage .gpkg

The format has a few advantages, especially being self-contained in a single file.
https://towardsdatascience.com/why-you-need-to-use-geopackage-files-instead-of-shapefile-or-geojson-7cb24fe56416

The idea here is to re-use the existing code and use different writers/readers depending on the file suffix (.shp or .gpkg).
The format is supported by an additional geotools plugin which is added to the .pom
The alternative would be to have individual ShapeFileReader/Writer and GeopackageReader/Writer classes instead of the newly refactored GeoFileReader/Writer

I tried to extend the existing tests as well.

Once we have support for gpkg I would also propose to change it in some of the output files, e.g. drt zonal wait stats. This can make it easier to use for streaming output files from cloud storages such as amazon S3, as one does not need to sync multiple files first but can directly stream a single file.

I'm not sure who should/can review this, but I hope someone will :)

edit: sorry for the unnecessary changes in formatting :(

image

@steffenaxer
Copy link
Collaborator

If I might add, shp files have some other limitations which are quite annoying. E.g.

  • Shapefile size limit: 2 GB.
  • Maximum field name length: 10 characters.
  • Maximum number of fields: 1024.
  • ...etc.
    So it make absolute sens to switch to an other format.

@nkuehnel thanks for this proposal!

@mrieser
Copy link
Contributor

mrieser commented Mar 6, 2024

geopackage-files can store multiple "layers" or feature-collections. Which one gets read if only the filename is specified? and what is the layer name when writing a feature collection to a gpkg-file? Browsing through the code I couldn't figure that out.

Otherwise: nice addition!

Not sure if it's worth the effort, but one could think about re-adding ShapeFileReader/Writer that simply delegate to GeofileReader/Writer, but are deprecated, so we don't immediately break other people's code.

@nkuehnel
Copy link
Member Author

nkuehnel commented Mar 7, 2024

@mrieser good points. Currently I was just getting the first layer that was returned from names() - but thinking about it I will just add layer as an additional parameter, which may be null in the shapefile case. In theory it also means we can now group different layers that relate to the same topic in one file (e.g. everything spatial related to one drt mode).
I will update accordingly.

I can also see the point of not breaking the old shapefile classes and will update this is as well

@jfbischoff
Copy link
Collaborator

Nice addition, thanks Nico!
I also agree on switching the defaults to gpk rather sooner than later :)

@nkuehnel
Copy link
Member Author

nkuehnel commented Mar 7, 2024

@mrieser I re-added the ShpFileWriter and reader and set them to deprecated and delegated to the GeoFileWriter/Reader classes, so outside dependants should not break.

In addition, layer names are now set to the featuretype name by default, but there are also methods to set the layer name explicitly.

With that I'd go forward and merge for now. Replacing actual usages will come in later PRs.
Also we might need to revisit re-using gpkg files for adding additional layers etc.

@nkuehnel nkuehnel merged commit 7ff5887 into matsim-org:master Mar 7, 2024
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants