shapely version of burst boundary processing #2

scottstanie · 2022-10-25T22:31:18Z

I'm putting this up as possible suggestions for ways to fix #1 and #3 . While I made it a separate script, it could be merged into the build_database, I just was tinkering to better understand some of the logic you had implemented and (for instance) why ESAs multipolygons were not actually working when I ran spatialite functions on them in the database. I'll let you handle the long-term fixes however you'd like.

The main notable changes are

For the antimeridian centroids, I convert the lat/lon multipolygon to UTM, get the centroid, then convert back (which fixes Potential bug on build_database.wkt2extent() #1 )
for the snap calculation, I'm now using floor/ceil instead of round, as was done in @vbrancat 's stack processing script.
In the main database, I create the geometry column using the spatialite method SELECT AddGeometryColumn(...)
Not important for the bounding box lookup, but I'm converting the WKB geometries to 2D. For some unknown reason, could not make spatialite work with the 3D multipolygon that ESA gives.

Features which could be nice to keep, even if you don't use this script:

I also make a minimal version with just burst_id_jpl, epsg, and the 4 bound columns. This makes the DB size only 72 Mb
I added an index to burst_id_jpl, which makes each lookup take <1ms instead of ~200 ms (this becomes more noticeable for stack processing when you have 50 frames x 30 bursts = 1500 bursts to lookup)
I added a download function for the original ESA database from their website
I added the command line parser regenerate with different snap/margin/output filename options
by loading and processing the geometries in python, it takes around ~1min on aurora instead of ~10 min to build the database

This reverts commit 67648c1.

seongsujeong

Thanks @scottstanie for this wonderful PR. That would greatly improve the performance when generating the burst database.

I've left some comments between the codes below.

My major concern is that I wanted to have the neighboring bursts in the three subswath to have the same EPSG. If I am understanding this code correctly, however, it looks like this PR calculated the EPSG burst by burst.

seongsujeong · 2022-11-23T19:57:13Z

utils.py

+        # antimeridian case
+        # Transform to the first UTM zone, get the centroid, then transform back
+        # Doesnt matter if it's the right EPSG; just need it to be in
+        # projected coordinates to make the centroid easy
+        utm_epsg = 32601 if geom.centroid.y > 0 else 32701
+        src, dst = 4326, utm_epsg
+        point = transform(transform(geom, src, dst).centroid, dst, src)


I'm not sure if that is the reliable way to calculate the centroid. Even the burst coordinates in UTM 1N has discontinuities on equator (so does every UTMs). Also there would be lots of distortions on the polygons that crosses the equator. See the attached image below.

The "antimeridian case" was already taken care of as shown in the code snipped below. The idea is basically "wrap" the longitudinal coordinates when we detect multiple polygons, calculate centroid individually, and weight-sum the coordinates by the areas of each polygons.

burst_db/build_database.py

Lines 290 to 333 in ced146b

elif num_polygon > 1:

offset_circular = 360.0

xy_weight_centroid = np.zeros((num_polygon, 3))

for id_polygon, nodes_polygon in enumerate(dict_geometry['coordinates']):

dict_sub_polygon={

"type": "MultiPolygon",

"coordinates":[nodes_polygon]}

json_sub_polygon = json.dumps(dict_sub_polygon)

geom_sub_polygon = ogr.CreateGeometryFromJson(json_sub_polygon)

centroid_sub_polygon = geom_sub_polygon.Centroid()

dict_centroid_sub_polygon = json.loads(centroid_sub_polygon.ExportToJson())

x_centroid = dict_centroid_sub_polygon['coordinates'][0]

y_centroid = dict_centroid_sub_polygon['coordinates'][1]

area_sub_polygon = geom_sub_polygon.Area()

xy_weight_centroid[id_polygon,:] = [(x_centroid + offset_circular) % offset_circular,

y_centroid,

area_sub_polygon]

# Weighted sum

x_centroid_weighted_raw = \

np.sum(xy_weight_centroid[:,0] * xy_weight_centroid[:,2])\

/ np.sum(xy_weight_centroid[:,2])

y_centroid_weighted_raw = \

np.sum(xy_weight_centroid[:,1] * xy_weight_centroid[:,2])\

/ np.sum(xy_weight_centroid[:,2])

# Refine the raw result of the weighted sum coordinates

if x_centroid_weighted_raw > 180.0:

x_centroid = x_centroid_weighted_raw - offset_circular

elif x_centroid_weighted_raw < -180.0:

x_centroid = x_centroid_weighted_raw + offset_circular

else:

x_centroid = x_centroid_weighted_raw

if y_centroid_weighted_raw > 180.0:

y_centroid = y_centroid_weighted_raw - offset_circular

elif y_centroid_weighted_raw < -180.0:

y_centroid = y_centroid_weighted_raw + offset_circular

else:

y_centroid = y_centroid_weighted_raw

Once we decide the proper EPSG, I think the correct boundary of the burst can be calculated by taking care of the multiple polygons (or using shapely functions as you did here

that is a very pleasing figure haha. how'd you make it?

I think that even though it would be terrible to warp normal points from all over the globe into UTM 1N, doing the warp -> centroid -> inverse warp shouldn't distort it too much? I'm gonna draw myself a figure since i'm unsure, but i'll double check and compare to the EPSG codes from the current version of your DB

well i've confused myself, since it seems like the warping might not necessarily get the right centroid if it's a really nonlinear bad warping... but I'm not sure which (if any) bursts it messed it up for?

burst_id_jpl,EPSG,epsg
t001_000110_iw1,32631,32630
t001_000111_iw1,32631,32630
t001_000112_iw1,32631,32630
t001_000113_iw1,32631,32630

import pandas as pd import sqlite3 import os os.chdir("/home/staniewi/dev") db1 = "burst_map_IW_000001_375887.OPERA-JPL.20220818_113416.sqlite3" # made from original version db2 = "burst_map_margin3000.sqlite3" # from new version with sqlite3.connect(db1) as con: df1 = pd.read_sql("select * from burst_id_map", con) with sqlite3.connect(db2) as con: df2 = pd.read_sql("select * from burst_id_map", con) mismatches = df1.EPSG != df2.epsg print(df1.shape, df2.shape) # (1127661, 13) (1127661, 13) print(mismatches.sum()) # 104617 df_merged = pd.merge(df1.loc[mismatches, ['burst_id_jpl','EPSG', ]], df2.loc[mismatches, ['epsg', 'burst_id_jpl']], on="burst_id_jpl") df_merged.to_csv("~/dev/burst_diffs.csv", index=False)

Looking at the mismatches, a lot are just off by one, which might be either bursts on the borderline of two UTM zones getting a slightly different centroid, or might be your keeping adjacent subswaths in the same UTM zone:

(burst id, then your version of the EPSG, then my script's output)

burst_id_jpl,EPSG,epsg t001_000110_iw1,32631,32630 t001_000111_iw1,32631,32630 t001_000112_iw1,32631,32630 ... (lots more like this) t001_000448_iw1,3413,32627 t001_000449_iw1,3413,32627 (more on the borderline between polar and non-polar regions)

The biggest differences i've seen look like this:

t001_000669_iw2,32612,32601 t001_000669_iw3,32612,32660 t001_000670_iw1,32614,32601 t001_000670_iw2,32614,32601

where these bursts on the anti-meridian are off due to what you pointed out in #1 .

Maybe you are taking a look at the old version of the database. In the updated algorithm, both your and my db have EPSG 32601 for t001_000670_iw1. I'll upload it to the server and send you the link. Or you can pull the changes, and run it on your side.

Regarding the EPSG numbers whose difference is 1, or UTM / PS: I think that is from the difference in the detail. I determine the EPSG from IW2, and assign the same projection for the corresponding bursts in IW1 and IW3. That can cause difference in the projection between yours and mine, which can be especially found around the "projection border"

Therefore I would not be surprised if there are lots of IW1 or IW3 bursts with different EPSG. In that case, comparing the projection only for IW2 bursts would make sense. I've calculated the EPSGs from your and my implementation, and the only occasion that were different was t037_077712_iw2 My DB had 32601, and yours was 32660. That difference is acceptable because they are in fact right next to each other in geographic coordinates.

Also regarding the distortion of polygon bursts on UTM1N (in the figure above): I was bit confused when thinking about it. I've checked that you apply "warp-centroid-warp_back" only for Antimeridian case i.e. multiple polygon in feature's geometry. Using UTM 1N would be okay in that case. Sorry for the false alarm.

make_db.py

scottstanie · 2022-12-07T02:12:33Z

My major concern is that I wanted to have the neighboring bursts in the three subswath to have the same EPSG. If I am understanding this code correctly, however, it looks like this PR calculated the EPSG burst by burst.

We can change that, but what would be the benefit of having IW1,2,3 in the same EPSG? I thought the two options we were considering were

geocode burst by burst in the most appropriate UTM projection by finding the burst centroid.
create the frame definition in advance so that all ~30 bursts in one frame have the same EPSG, and decide the frame EPSG based on the frame center.

if we're doing things burst-wise, then it seems like (as an example) burst t001_000001_iw1 is no more related to t001_000001_iw2 than it is to t001_000002_iw1 (the next one along track), but was there something you or @hfattahi thought of that created a good reason to keep iw1/2/3 together?

seongsujeong · 2022-12-07T06:21:51Z

My major concern is that I wanted to have the neighboring bursts in the three subswath to have the same EPSG. If I am understanding this code correctly, however, it looks like this PR calculated the EPSG burst by burst.

We can change that, but what would be the benefit of having IW1,2,3 in the same EPSG? I thought the two options we were considering were

geocode burst by burst in the most appropriate UTM projection by finding the burst centroid.

create the frame definition in advance so that all ~30 bursts in one frame have the same EPSG, and decide the frame EPSG based on the frame center.

if we're doing things burst-wise, then it seems like (as an example) burst t001_000001_iw1 is no more related to t001_000001_iw2 than it is to t001_000002_iw1 (the next one along track), but was there something you or @hfattahi thought of that created a good reason to keep iw1/2/3 together?

I think we have to ultimately do the 2. i.e. frame-based EPSG definition, as we have discussed in the ADT meeting. But I'm not sure if we have the frame definition now. Please let me know if we do.
Until we have it, I think we have to stick with the EPSG designation strategy that is more stitch-friendly.

For example let's think about this case (actual ESPG might differ)

burst_id_jpl	EPSG
`t001_000001_iw1`	32614
`t001_000001_iw2`	32614
`t001_000001_iw3`	32615
`t002_000001_iw1`	32615

The first three bursts are supposed to be stitch-able because they are from the same orbit. However, if we do not have the same projection for them above, it would be very tricky to do mosaic them because we need to resample the third one.

Would we stitch the last two bursts in the table? no. because they are not from the same orbit. Therefore I think it makes more sense to determine the projection from "burst triplet", rather than each individual bursts' centroids.

scottstanie · 2023-01-12T20:29:38Z

this was closed in favor of #8 , which implements a superset of the features here.

scottstanie added 28 commits October 21, 2022 13:24

testing fix for idl

67648c1

add a pandas/sql version of database creation

855c2d4

Revert "testing fix for idl"

1777f72

This reverts commit 67648c1.

add esa download, use timestr

4ee5b37

download esa if not exists

850f509

add argparse

bdb0496

typo

180d5f0

add limit arg

051070a

typo

7a17aa5

fix args

ec5fc93

fix db updates

bf5f342

fix sql for real

6b422db

bug in spatialite

293122d

new version in python again

309509a

add separate utils

4f0db8e

add other opts

58470e4

use partial for kwargs

0248339

remove sql version

892aead

pass through snap margin

462ff4c

drop burst id, not rel orbit

1fe05ec

espg typo

81eb296

add a minimal db version

83ee338

change snap to floor/ceil

7834ab7

Merge branch 'main' of github.com:scottstanie/burst_db

9bf99c7

print the minimal name

94c135d

pass through margin

4942a9a

default=4000

7a64fd3

Merge branch 'main' of github.com:scottstanie/burst_db

b5a2e35

scottstanie mentioned this pull request Oct 25, 2022

[Bug]: burst bounds are not conservative enough with margin=1000 #3

Closed

scottstanie requested a review from seongsujeong October 26, 2022 14:52

scottstanie added 3 commits October 28, 2022 11:23

dont drop esa burst_id or subswath name in big db

67d8764

add setup and reqs

888134e

ensure bbox is saved as INT, not REAL

7f0ae4e

seongsujeong reviewed Dec 6, 2022

View reviewed changes

seongsujeong mentioned this pull request Dec 8, 2022

Across-track EPSG constraint #5

Closed

scottstanie added 2 commits December 9, 2022 13:11

Merge branch 'main' of github.com:opera-adt/burst_db

6d20f20

Merge branch 'main' of github.com:opera-adt/burst_db

c90572e

scottstanie closed this Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shapely version of burst boundary processing #2

shapely version of burst boundary processing #2

scottstanie commented Oct 25, 2022 •

edited

Loading

seongsujeong left a comment

seongsujeong Nov 23, 2022

scottstanie Dec 7, 2022

scottstanie Dec 7, 2022

seongsujeong Dec 7, 2022 •

edited

Loading

seongsujeong Dec 7, 2022

scottstanie commented Dec 7, 2022

seongsujeong commented Dec 7, 2022 •

edited

Loading

scottstanie commented Jan 12, 2023

	elif num_polygon > 1:
	offset_circular = 360.0
	xy_weight_centroid = np.zeros((num_polygon, 3))

	for id_polygon, nodes_polygon in enumerate(dict_geometry['coordinates']):
	dict_sub_polygon={
	"type": "MultiPolygon",
	"coordinates":[nodes_polygon]}

	json_sub_polygon = json.dumps(dict_sub_polygon)
	geom_sub_polygon = ogr.CreateGeometryFromJson(json_sub_polygon)

	centroid_sub_polygon = geom_sub_polygon.Centroid()
	dict_centroid_sub_polygon = json.loads(centroid_sub_polygon.ExportToJson())
	x_centroid = dict_centroid_sub_polygon['coordinates'][0]
	y_centroid = dict_centroid_sub_polygon['coordinates'][1]
	area_sub_polygon = geom_sub_polygon.Area()

	xy_weight_centroid[id_polygon,:] = [(x_centroid + offset_circular) % offset_circular,
	y_centroid,
	area_sub_polygon]

	# Weighted sum
	x_centroid_weighted_raw = \
	np.sum(xy_weight_centroid[:,0] * xy_weight_centroid[:,2])\
	/ np.sum(xy_weight_centroid[:,2])
	y_centroid_weighted_raw = \
	np.sum(xy_weight_centroid[:,1] * xy_weight_centroid[:,2])\
	/ np.sum(xy_weight_centroid[:,2])

	# Refine the raw result of the weighted sum coordinates
	if x_centroid_weighted_raw > 180.0:
	x_centroid = x_centroid_weighted_raw - offset_circular
	elif x_centroid_weighted_raw < -180.0:
	x_centroid = x_centroid_weighted_raw + offset_circular
	else:
	x_centroid = x_centroid_weighted_raw

	if y_centroid_weighted_raw > 180.0:
	y_centroid = y_centroid_weighted_raw - offset_circular
	elif y_centroid_weighted_raw < -180.0:
	y_centroid = y_centroid_weighted_raw + offset_circular
	else:
	y_centroid = y_centroid_weighted_raw

shapely version of burst boundary processing #2

shapely version of burst boundary processing #2

Conversation

scottstanie commented Oct 25, 2022 • edited Loading

seongsujeong left a comment

Choose a reason for hiding this comment

seongsujeong Nov 23, 2022

Choose a reason for hiding this comment

scottstanie Dec 7, 2022

Choose a reason for hiding this comment

scottstanie Dec 7, 2022

Choose a reason for hiding this comment

seongsujeong Dec 7, 2022 • edited Loading

Choose a reason for hiding this comment

seongsujeong Dec 7, 2022

Choose a reason for hiding this comment

scottstanie commented Dec 7, 2022

seongsujeong commented Dec 7, 2022 • edited Loading

scottstanie commented Jan 12, 2023

scottstanie commented Oct 25, 2022 •

edited

Loading

seongsujeong Dec 7, 2022 •

edited

Loading

seongsujeong commented Dec 7, 2022 •

edited

Loading