-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shapely version of burst boundary processing #2
Conversation
This reverts commit 67648c1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @scottstanie for this wonderful PR. That would greatly improve the performance when generating the burst database.
I've left some comments between the codes below.
My major concern is that I wanted to have the neighboring bursts in the three subswath to have the same EPSG. If I am understanding this code correctly, however, it looks like this PR calculated the EPSG burst by burst.
# antimeridian case | ||
# Transform to the first UTM zone, get the centroid, then transform back | ||
# Doesnt matter if it's the right EPSG; just need it to be in | ||
# projected coordinates to make the centroid easy | ||
utm_epsg = 32601 if geom.centroid.y > 0 else 32701 | ||
src, dst = 4326, utm_epsg | ||
point = transform(transform(geom, src, dst).centroid, dst, src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if that is the reliable way to calculate the centroid. Even the burst coordinates in UTM 1N has discontinuities on equator (so does every UTMs). Also there would be lots of distortions on the polygons that crosses the equator. See the attached image below.
The "antimeridian case" was already taken care of as shown in the code snipped below. The idea is basically "wrap" the longitudinal coordinates when we detect multiple polygons, calculate centroid individually, and weight-sum the coordinates by the areas of each polygons.
Lines 290 to 333 in ced146b
elif num_polygon > 1: | |
offset_circular = 360.0 | |
xy_weight_centroid = np.zeros((num_polygon, 3)) | |
for id_polygon, nodes_polygon in enumerate(dict_geometry['coordinates']): | |
dict_sub_polygon={ | |
"type": "MultiPolygon", | |
"coordinates":[nodes_polygon]} | |
json_sub_polygon = json.dumps(dict_sub_polygon) | |
geom_sub_polygon = ogr.CreateGeometryFromJson(json_sub_polygon) | |
centroid_sub_polygon = geom_sub_polygon.Centroid() | |
dict_centroid_sub_polygon = json.loads(centroid_sub_polygon.ExportToJson()) | |
x_centroid = dict_centroid_sub_polygon['coordinates'][0] | |
y_centroid = dict_centroid_sub_polygon['coordinates'][1] | |
area_sub_polygon = geom_sub_polygon.Area() | |
xy_weight_centroid[id_polygon,:] = [(x_centroid + offset_circular) % offset_circular, | |
y_centroid, | |
area_sub_polygon] | |
# Weighted sum | |
x_centroid_weighted_raw = \ | |
np.sum(xy_weight_centroid[:,0] * xy_weight_centroid[:,2])\ | |
/ np.sum(xy_weight_centroid[:,2]) | |
y_centroid_weighted_raw = \ | |
np.sum(xy_weight_centroid[:,1] * xy_weight_centroid[:,2])\ | |
/ np.sum(xy_weight_centroid[:,2]) | |
# Refine the raw result of the weighted sum coordinates | |
if x_centroid_weighted_raw > 180.0: | |
x_centroid = x_centroid_weighted_raw - offset_circular | |
elif x_centroid_weighted_raw < -180.0: | |
x_centroid = x_centroid_weighted_raw + offset_circular | |
else: | |
x_centroid = x_centroid_weighted_raw | |
if y_centroid_weighted_raw > 180.0: | |
y_centroid = y_centroid_weighted_raw - offset_circular | |
elif y_centroid_weighted_raw < -180.0: | |
y_centroid = y_centroid_weighted_raw + offset_circular | |
else: | |
y_centroid = y_centroid_weighted_raw |
Once we decide the proper EPSG, I think the correct boundary of the burst can be calculated by taking care of the multiple polygons (or using shapely functions as you did here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a very pleasing figure haha. how'd you make it?
I think that even though it would be terrible to warp normal points from all over the globe into UTM 1N, doing the warp -> centroid -> inverse warp
shouldn't distort it too much? I'm gonna draw myself a figure since i'm unsure, but i'll double check and compare to the EPSG codes from the current version of your DB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well i've confused myself, since it seems like the warping might not necessarily get the right centroid if it's a really nonlinear bad warping... but I'm not sure which (if any) bursts it messed it up for?
burst_id_jpl,EPSG,epsg
t001_000110_iw1,32631,32630
t001_000111_iw1,32631,32630
t001_000112_iw1,32631,32630
t001_000113_iw1,32631,32630
import pandas as pd
import sqlite3
import os
os.chdir("/home/staniewi/dev")
db1 = "burst_map_IW_000001_375887.OPERA-JPL.20220818_113416.sqlite3" # made from original version
db2 = "burst_map_margin3000.sqlite3" # from new version
with sqlite3.connect(db1) as con:
df1 = pd.read_sql("select * from burst_id_map", con)
with sqlite3.connect(db2) as con:
df2 = pd.read_sql("select * from burst_id_map", con)
mismatches = df1.EPSG != df2.epsg
print(df1.shape, df2.shape) # (1127661, 13) (1127661, 13)
print(mismatches.sum()) # 104617
df_merged = pd.merge(df1.loc[mismatches, ['burst_id_jpl','EPSG', ]], df2.loc[mismatches, ['epsg', 'burst_id_jpl']], on="burst_id_jpl")
df_merged.to_csv("~/dev/burst_diffs.csv", index=False)
Looking at the mismatches, a lot are just off by one, which might be either bursts on the borderline of two UTM zones getting a slightly different centroid, or might be your keeping adjacent subswaths in the same UTM zone:
(burst id, then your version of the EPSG, then my script's output)
burst_id_jpl,EPSG,epsg
t001_000110_iw1,32631,32630
t001_000111_iw1,32631,32630
t001_000112_iw1,32631,32630
... (lots more like this)
t001_000448_iw1,3413,32627
t001_000449_iw1,3413,32627
(more on the borderline between polar and non-polar regions)
The biggest differences i've seen look like this:
t001_000669_iw2,32612,32601
t001_000669_iw3,32612,32660
t001_000670_iw1,32614,32601
t001_000670_iw2,32614,32601
where these bursts on the anti-meridian are off due to what you pointed out in #1 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you are taking a look at the old version of the database. In the updated algorithm, both your and my db have EPSG 32601 for t001_000670_iw1
. I'll upload it to the server and send you the link. Or you can pull the changes, and run it on your side.
Regarding the EPSG numbers whose difference is 1, or UTM / PS: I think that is from the difference in the detail. I determine the EPSG from IW2, and assign the same projection for the corresponding bursts in IW1 and IW3. That can cause difference in the projection between yours and mine, which can be especially found around the "projection border"
Therefore I would not be surprised if there are lots of IW1 or IW3 bursts with different EPSG. In that case, comparing the projection only for IW2 bursts would make sense. I've calculated the EPSGs from your and my implementation, and the only occasion that were different was t037_077712_iw2
My DB had 32601, and yours was 32660. That difference is acceptable because they are in fact right next to each other in geographic coordinates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also regarding the distortion of polygon bursts on UTM1N (in the figure above): I was bit confused when thinking about it. I've checked that you apply "warp-centroid-warp_back" only for Antimeridian case i.e. multiple polygon in feature's geometry. Using UTM 1N would be okay in that case. Sorry for the false alarm.
We can change that, but what would be the benefit of having IW1,2,3 in the same EPSG? I thought the two options we were considering were
if we're doing things burst-wise, then it seems like (as an example) burst |
I think we have to ultimately do the For example let's think about this case (actual ESPG might differ)
The first three bursts are supposed to be stitch-able because they are from the same orbit. However, if we do not have the same projection for them above, it would be very tricky to do mosaic them because we need to resample the third one. Would we stitch the last two bursts in the table? no. because they are not from the same orbit. Therefore I think it makes more sense to determine the projection from "burst triplet", rather than each individual bursts' centroids. |
this was closed in favor of #8 , which implements a superset of the features here. |
I'm putting this up as possible suggestions for ways to fix #1 and #3 . While I made it a separate script, it could be merged into the
build_database
, I just was tinkering to better understand some of the logic you had implemented and (for instance) why ESAs multipolygons were not actually working when I ran spatialite functions on them in the database. I'll let you handle the long-term fixes however you'd like.The main notable changes are
build_database.wkt2extent()
#1 )snap
calculation, I'm now usingfloor/ceil
instead ofround
, as was done in @vbrancat 's stack processing script.geometry
column using the spatialite methodSELECT AddGeometryColumn(...)
Features which could be nice to keep, even if you don't use this script:
burst_id_jpl
, epsg, and the 4 bound columns. This makes the DB size only 72 Mbburst_id_jpl
, which makes each lookup take <1ms instead of ~200 ms (this becomes more noticeable for stack processing when you have 50 frames x 30 bursts = 1500 bursts to lookup)