Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up large polygon intersections using STRtrees and many other things #259

Open
wants to merge 105 commits into
base: as/stabletasks
Choose a base branch
from

Conversation

asinghvi17
Copy link
Member

@asinghvi17 asinghvi17 commented Feb 14, 2025

Note

This is effectively the GeometryOps dev branch now.

\This PR uses STRtrees to substantially accelerate polygon queries. It pulls the actual looping all the way out and allows the use of STRtrees to skip substantial sections of polygons.

There are three methods here:

  • naive, n*m nested loop
  • loop over all edges of a, strtree on b
  • dual tree query on strtrees from a and b

This algorithm should be easily portable to s2 when that becomes available - and the structure allows e.g. manifold passthrough as well.


This PR also adds a LoopStateMachine module that basically just defines an Action wrapper struct, and a macro that can take care of that action wrapper struct.

With this macro, a function running within a loop can return Break() or Continue() which indicates to the caller that it should break or continue.

E.g.:

f(i) = i == 3 ? Break() : i
count = 1
for i in 1:5
    count = @processloopaction f(i)
end
count # 2

TODOs:

  • add rejection tests for duplicate points
  • add lopsided single tree query
  • add dual tree query
  • figure out good heuristics for when to choose which
  • add more tests and benchmarks with complex polygons

@asinghvi17
Copy link
Member Author

@rafaqz this is where I want to make it super easy to have a DimMatrix of benchmarks, to test scaling dynamics....

@asinghvi17 asinghvi17 changed the title speed up large polygon intersections using an STRtree on poly b speed up large polygon intersections using STRtrees. Feb 14, 2025
- Move STRDualQuery and the utils to utils/
- Create a new module LoopStateMachine that just defines some actions, and a macro @processloopactions that you can use to break out of a loop if you want, in a `map_some_things(f1, f2, f3, ...)`.

Add tests for LoopStateMachine
Use LoopStateMachine in the maybe function I added to clipping processor
@asinghvi17
Copy link
Member Author

I just realized, the tests aren't actually testing anything :D

@asinghvi17 asinghvi17 added the enhancement New feature or request label Feb 14, 2025
@asinghvi17 asinghvi17 self-assigned this Feb 14, 2025
* lay off 3/4 of the testsets

this uses the ContextTestSet from Test to reduce the amount of visual cruft.  It doesn't reduce it all the way but it can help.

Next step is to have a custom testset that can elide the excessive display.

* print better context

* also test densified geoms
@asinghvi17
Copy link
Member Author

See #260 for the testset diff

@asinghvi17
Copy link
Member Author

Seems the densified module isn't actually testing anything...

@asinghvi17
Copy link
Member Author

The last 4 commits before this comment implement the natural tree from tg as well as a spatial tree interface that should be valid for any spatial tree. The functions that SpatialTreeInterface enables are sufficiently generic that they should be able to work for any number of dimensions and potentially even on the sphere.

The natural tree has been tested to perform a scanline of Boston's coordinates in the USA polygon, 35000 vertices, in ~80 ns! That's insane timing. It ought to speed up the point in polygon to be similar in speed to the C library one, if we don't use exact predicates.

@asinghvi17
Copy link
Member Author

asinghvi17 commented Feb 28, 2025

With preparation we can be 2.5 OOM faster, without it the indexing approach is 2x slower, at least for now. There are likely some efficiencies in the memory allocation that I am not exploiting, and the GeometryOps implementation is not as cache friendly as the tg implementation is.

# Original

julia> @be GO.contains($(usa_poly), $((-76, 42)))
Benchmark: 3590 samples with 4 evaluations
 min    5.729 μs
 median 6.604 μs
 mean   6.426 μs
 max    16.208 μs

# After ExactPredicates
@be GO.contains($(usa_poly), $((-76, 42)))
Benchmark: 3672 samples with 4 evaluations
 min    5.708 μs
 median 6.396 μs
 mean   6.235 μs
 max    8.198 μs

# After NaturalIndex (this IS constructing the natural index as well)
# this is substantially slower than brute force!
julia> @be GO.contains($(usa_poly), $((-76, 42)))
Benchmark: 1359 samples with 1 evaluation
 min    22.291 μs (11 allocs: 403.688 KiB) # compared to 14 microseconds for TG given a GI geom
 median 38.500 μs (11 allocs: 403.688 KiB) # compared to 38 microseconds for TG given a GI geom
 mean   58.483 μs (11 allocs: 403.688 KiB, 1.34% gc time)
 max    3.341 ms (11 allocs: 403.688 KiB, 98.75% gc time)

# With pre-computed natural index on usa_poly
# 2.5 OOM faster!
julia> @be GO.contains($(GO.prepare_naturally(usa_poly)), $((-76, 42)))
Benchmark: 3629 samples with 66 evaluations
 min    341.545 ns (1 allocs: 16 bytes)
 median 381.318 ns (1 allocs: 16 bytes)
 mean   390.878 ns (1 allocs: 16 bytes)
 max    901.515 ns (1 allocs: 16 bytes)

@rafaqz
Copy link
Member

rafaqz commented Feb 28, 2025

So fast!!

I guess we can reduce those allocs a bit or something

@asinghvi17
Copy link
Member Author

asinghvi17 commented Mar 1, 2025

Turns out intersection_point tests are failing because the algorithm is returning strange intersection points...

@asinghvi17
Copy link
Member Author

Brief experiment to confirm whether a lazy generator edge list has equivalent speed to a non-lazy version:

# Setup code
import GeometryOps as GO
import GeometryOps.SpatialTreeInterface as STI
using Extents

function doityourself(f, pred, geom)
       p1 = GI.getpoint(geom, 1)
       #p2 = GI.getpoint(geom, 2)
       for i in 2:GI.npoint(geom)
           p2 = GI.getpoint(geom, i)
           ext = Extent(X = extrema(GI.x, (p1, p2)), Y = extrema(GI.y, (p1, p2)))
           pred(ext) && f(i)
           p1 = p2
       end
 end


# get data together
using NaturalEarth

all_countries = naturalearth("admin_0_countries", 10)^C
usa_multipoly = all_countries.geometry[findfirst(==("United States of America"), all_countries.NAME)] |> GO.tuples #x -> GI.convert(LG, x) |> LG.makeValid |> GO.tuples^C
usa_poly = GI.getgeom(usa_multipoly, findmax(GO.area.(GI.getgeom(usa_multipoly)))[2]) # isolate the poly with the most area^C
usa_ring = GI.getexterior(usa_poly)


# now try things
using Chairmarks
# benchmark: find all edges that cross 42 degrees latitude
tree_lazy = STI.FlatNoTree(GO.lazy_edge_extents(usa_ring))
tree_eager = STI.FlatNoTree(GO.edge_extents(usa_ring))
tree_natural = GO.NaturalIndexing.NaturalIndex(GO.lazy_edge_extents(usa_ring))

Natural indexing (fastest, of course)

julia> @be Int[] STI.do_query(Base.Fix1(push!, _), $(ext -> ext.Y[1] <= 42 <= ext.Y[2]), $tree_natural)
Benchmark: 2717 samples with 126 evaluations
 min    205.357 ns (0.06 allocs: 177.524 bytes)
 median 224.532 ns (0.06 allocs: 177.524 bytes)
 mean   276.546 ns (0.06 allocs: 177.524 bytes, 0.11% gc time)
 max    36.939 μs (0.06 allocs: 177.524 bytes, 98.80% gc time)

Lazy extent list in FlatNoTree (what it says on the tin - no tree at all)

julia> @be Int[] STI.do_query(Base.Fix1(push!, _), $(ext -> ext.Y[1] <= 42 <= ext.Y[2]), $tree_lazy)
Benchmark: 4073 samples with 2 evaluations
 min    11.188 μs (1 allocs: 232 bytes)
 median 11.229 μs (1 allocs: 232 bytes)
 mean   11.490 μs (1 allocs: 232 bytes)
 max    48.896 μs (1 allocs: 232 bytes)

Materialized vector of extents in eager tree

julia> @be Int[] STI.do_query(Base.Fix1(push!, _), $(ext -> ext.Y[1] <= 42 <= ext.Y[2]), $tree_eager)
Benchmark: 4983 samples with 3 evaluations
 min    5.611 μs (0.67 allocs: 154.667 bytes)
 median 6.194 μs (0.67 allocs: 154.667 bytes)
 mean   6.200 μs (0.67 allocs: 154.667 bytes)
 max    15.597 μs (0.67 allocs: 154.667 bytes)

DIY solution - compute extent and check on the fly

julia> @be Int[] doityourself(Base.Fix1(push!, _), $(ext -> ext.Y[1] <= 42 <= ext.Y[2]), $(usa_ring))
Benchmark: 4080 samples with 2 evaluations
 min    11.146 μs (1 allocs: 232 bytes)
 median 11.188 μs (1 allocs: 232 bytes)
 mean   11.332 μs (1 allocs: 232 bytes)
 max    21.354 μs (1 allocs: 232 bytes)

It's clear that the lazy indexing takes the same amount of time as the DIY solution, meaning we can actually

@asinghvi17
Copy link
Member Author

Just pulled natural trees in to the foreach_maybe_intersecting_pair_of_points function. Double trees are crazy!!!

With SingleSTRTree plus extent thinning:

julia> @be GO.intersection_points($usa_poly, $usa_reflected)
Benchmark: 37 samples with 1 evaluation
 min    1.967 ms (32601 allocs: 3.785 MiB)
 median 2.214 ms (32601 allocs: 3.785 MiB)
 mean   2.729 ms (32601 allocs: 3.785 MiB, 7.83% gc time)
 max    8.556 ms (32601 allocs: 3.785 MiB, 65.28% gc time)

With SingleNaturalTree plus extent thinning (~2x performance improvement)

julia> @be GO.intersection_points($(GO.Planar()), $(GO.SingleNaturalTree()), $usa_poly, $usa_reflected)
Benchmark: 50 samples with 1 evaluation
 min    1.087 ms (30294 allocs: 2.958 MiB)
 median 1.178 ms (30294 allocs: 2.958 MiB)
 mean   2.004 ms (30294 allocs: 2.958 MiB, 8.10% gc time)
 max    26.876 ms (30294 allocs: 2.958 MiB, 94.69% gc time)

With SingleNaturalTree and no extent thinning

julia> @be GO.intersection_points($(GO.Planar()), $(GO.SingleNaturalTree()), $usa_poly, $usa_reflected)
Benchmark: 57 samples with 1 evaluation
 min    1.086 ms (30230 allocs: 4.443 MiB)
 median 1.316 ms (30230 allocs: 4.443 MiB)
 mean   1.754 ms (30230 allocs: 4.443 MiB, 10.87% gc time)
 max    6.512 ms (30230 allocs: 4.443 MiB, 74.14% gc time)

So it doesn't seem like extent thinning helps too much, maybe a 100 microsecond difference. Will now try double-tree query with naturaltrees.

Dual tree queries

Since the cost of creating a tree has gone way down with SingleNaturalTree, we can do dual tree queries easily. It turns out that extent thinning takes too long on natural trees, so we can exclude those.

With no extent thinning:

julia> @be GO.intersection_points($(GO.Planar()), $(GO.DoubleNaturalTree()), $usa_poly, $usa_reflected) seconds=1
Benchmark: 1155 samples with 1 evaluation
 min    485.375 μs (2283 allocs: 5.967 MiB)
 median 606.000 μs (2283 allocs: 5.967 MiB)
 mean   791.566 μs (2283 allocs: 5.967 MiB, 11.66% gc time)
 max    5.663 ms (2283 allocs: 5.967 MiB, 88.21% gc time)

With extent thinning on both sides:

julia> @be GO.intersection_points($(GO.Planar()), $(GO.ThinnedDoubleNaturalTree()), $usa_poly, $usa_reflected) seconds=1
Benchmark: 1186 samples with 1 evaluation
 min    582.583 μs (2368 allocs: 2.744 MiB)
 median 693.104 μs (2368 allocs: 2.744 MiB)
 mean   798.452 μs (2368 allocs: 2.744 MiB, 5.87% gc time)
 max    5.503 ms (2368 allocs: 2.744 MiB, 78.88% gc time)

@rafaqz
Copy link
Member

rafaqz commented Mar 1, 2025

Yeah, extent thinning was speeding up the huge sort for STRtree but maybe there's less to speed up for natural.

But that looks 4x faster than before! So, prob faster than GEOS in all cases?

@asinghvi17
Copy link
Member Author

Unfortunately not, it's still about the same speed for the circle case and the super-segmentized collection of squares. But for natural polygons it seems pretty good, and more efficient than computing an STR tree for sure.

@asinghvi17
Copy link
Member Author

huh - turns out there are situations in which dual-tree is slower than single tree by a substantial amount. I guess it's good we have options then!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants