-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-3026: spatial index per graph and kryo serialization #3027
base: main
Are you sure you want to change the base?
Conversation
0829aa5
to
01bad5d
Compare
It should build now. Its about 90% done. One issue is, that @LorenzBuehmann used Right now I switched to JTS native WKB serialization which makes the build work normally (in principle, the geometry-serializer classname could be made part of a spatial index metadata header - such a header does not exist yet). I also added a little compatibility layer for the old index IO which used java serialization. So old index files should continue to load. Saving indexes will use the new serializer. |
The dependencies will need to be checked - kryo is 3-clause BSD so that has implications. (I haven't looked at the new dependencies, just kyro.) Apache Sedona has some entries in its LICENSE
These are a nuisance because they are not guaranteed to be maintained by dependency updates. 😞 |
About performance differences, @SimonBin found this article: https://kontinuation.one/posts/2022/12/improving-geometry-serde-performance-of-apache-sedona/ This suggests that Sedona's ShapeSerde is generally a lot faster. Not sure if the current implementation is perhaps already even one of the optimized variants. |
Some notes about the remaining issues I am working on to resolve. I am a bit worried that a hard dependency on sedona-spark may cause too many issues, so I am trying to decouple the geometry serde (serializer/deserializer) from the rest of the spatial-index serde. For this, I added a plain text JSON header field to the spatial index format with a field for the geometry serde. This field can be read and used to configure the remaining kryo serde. The index reader makes a lookup with Class.forName() and tries to create an instance via the default constructor. With this approach it would be possible to use the slower JTS-based geometry serialization for now and have a compatible upgrade path to a faster approach at a later point. Perhaps the geometrySerde header field is not needed, because I see that in Sedona the implementation is serializable and apparently their approach is to just read/write the geometry serde instance out using plain Java serialization during the kryo serialization process. So I need to check what is the best way, but something along these lines should work to support the same spatial-index serde with different geometry serdes in a compatible way. This snippet is an example for how the header could be set up: // SpatialIndexIoKryo.java
public static void writeToOutputStream(OutputStream os, SpatialIndexPerGraph index) {
// geometrySerde could later be switched to sedona's ShapeSerde.
// (Need to check whether I can readily reuse an interface
// from JTS/Sedona instead of my own GeometrySerdeAdapter)
GeometrySerdeAdapter geometrySerde = new GeometrySerdeAdapterJtsWkb();
JsonObject header = new JsonObject();
header.addProperty("type", "jena-spatial-index");
header.addProperty("version", "2.0.0");
header.addProperty("geometrySerde", geometrySerde.getClass().getName());
Kryo kryo = new Kryo();
JtsKryoRegistrator.registerClasses(kryo, geometrySerde);
try (Output output = new Output(os)) {
writeJson(output, header);
writeIndex(kryo, output, index);
output.flush();
} |
b0a9b8b
to
ae4ba13
Compare
/* This file is an adapted copy of org.locationtech.jts.index.strtree.IndexSerde. */ | ||
/* The change is, that geometry read/write is delegated to a GeometrySerdeAdapter instance. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is copy&adapt from sedona-common 1.7.0 (latest)- they actually used this package hack to access non-visible stuff. in the long run, jts should fix this design.
...c/main/java/org/apache/jena/geosparql/spatial/index/v2/serde/sedona/CustomGeometrySerde.java
Outdated
Show resolved
Hide resolved
...in/java/org/apache/jena/geosparql/spatial/index/v2/serde/sedona/CustomSpatialIndexSerde.java
Outdated
Show resolved
Hide resolved
So I had to copy and adapt a total of They all have an Apache V2 header - so it should be fine? |
The spatial index does not store any geometries - just their envelopes and feature nodes - so geometry serialization performance is not an issue in the first place 🤦♂️ - but good having talked about it 😀
|
52317c9
to
fcae47a
Compare
I now also added the benchmark that evaluates loading a spatial index from disk - so w.r.t. index loading during server startup, the kryo version is (still) significantly faster.
A geometry mix comprises 8 different geometry types (point, linestring, linearring, polygon, multipoint, multilinestring, multipolygon, geometrycollection). So the actual number of envelopes in the spatial index is |
@afs What would be the best way to add a Fuseki module for re-building the index for selected graphs? In an ideal world, the spatial index would be synchronized with the dataset, but for the time being we created a little Fuseki-Mod, where one can select the graphs for which to rebuild the index. Back then we forked your This is what the spatial index rebuilder looks like. The html is currently a single self-contained file (ai generated) - so no extra frontend build is needed. |
GitHub issue resolved #3026
Pull request Description: Adds support for spatial index per graph and kryo serialization.
By submitting this pull request, I acknowledge that I am making a contribution to the Apache Software Foundation under the terms and conditions of the Contributor's Agreement.
See the Apache Jena "Contributing" guide.