Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shapely version of burst boundary processing #2

Closed
wants to merge 33 commits into from

Conversation

scottstanie
Copy link
Contributor

@scottstanie scottstanie commented Oct 25, 2022

I'm putting this up as possible suggestions for ways to fix #1 and #3 . While I made it a separate script, it could be merged into the build_database, I just was tinkering to better understand some of the logic you had implemented and (for instance) why ESAs multipolygons were not actually working when I ran spatialite functions on them in the database. I'll let you handle the long-term fixes however you'd like.

The main notable changes are

  • For the antimeridian centroids, I convert the lat/lon multipolygon to UTM, get the centroid, then convert back (which fixes Potential bug on build_database.wkt2extent() #1 )
  • for the snap calculation, I'm now using floor/ceil instead of round, as was done in @vbrancat 's stack processing script.
  • In the main database, I create the geometry column using the spatialite method SELECT AddGeometryColumn(...)
  • Not important for the bounding box lookup, but I'm converting the WKB geometries to 2D. For some unknown reason, could not make spatialite work with the 3D multipolygon that ESA gives.

Features which could be nice to keep, even if you don't use this script:

  • I also make a minimal version with just burst_id_jpl, epsg, and the 4 bound columns. This makes the DB size only 72 Mb
  • I added an index to burst_id_jpl, which makes each lookup take <1ms instead of ~200 ms (this becomes more noticeable for stack processing when you have 50 frames x 30 bursts = 1500 bursts to lookup)
  • I added a download function for the original ESA database from their website
  • I added the command line parser regenerate with different snap/margin/output filename options
  • by loading and processing the geometries in python, it takes around ~1min on aurora instead of ~10 min to build the database

Copy link
Collaborator

@seongsujeong seongsujeong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scottstanie for this wonderful PR. That would greatly improve the performance when generating the burst database.

I've left some comments between the codes below.

My major concern is that I wanted to have the neighboring bursts in the three subswath to have the same EPSG. If I am understanding this code correctly, however, it looks like this PR calculated the EPSG burst by burst.

Comment on lines +15 to +21
# antimeridian case
# Transform to the first UTM zone, get the centroid, then transform back
# Doesnt matter if it's the right EPSG; just need it to be in
# projected coordinates to make the centroid easy
utm_epsg = 32601 if geom.centroid.y > 0 else 32701
src, dst = 4326, utm_epsg
point = transform(transform(geom, src, dst).centroid, dst, src)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if that is the reliable way to calculate the centroid. Even the burst coordinates in UTM 1N has discontinuities on equator (so does every UTMs). Also there would be lots of distortions on the polygons that crosses the equator. See the attached image below.

Screen Shot 2022-11-23 at 11 45 48

The "antimeridian case" was already taken care of as shown in the code snipped below. The idea is basically "wrap" the longitudinal coordinates when we detect multiple polygons, calculate centroid individually, and weight-sum the coordinates by the areas of each polygons.

burst_db/build_database.py

Lines 290 to 333 in ced146b

elif num_polygon > 1:
offset_circular = 360.0
xy_weight_centroid = np.zeros((num_polygon, 3))
for id_polygon, nodes_polygon in enumerate(dict_geometry['coordinates']):
dict_sub_polygon={
"type": "MultiPolygon",
"coordinates":[nodes_polygon]}
json_sub_polygon = json.dumps(dict_sub_polygon)
geom_sub_polygon = ogr.CreateGeometryFromJson(json_sub_polygon)
centroid_sub_polygon = geom_sub_polygon.Centroid()
dict_centroid_sub_polygon = json.loads(centroid_sub_polygon.ExportToJson())
x_centroid = dict_centroid_sub_polygon['coordinates'][0]
y_centroid = dict_centroid_sub_polygon['coordinates'][1]
area_sub_polygon = geom_sub_polygon.Area()
xy_weight_centroid[id_polygon,:] = [(x_centroid + offset_circular) % offset_circular,
y_centroid,
area_sub_polygon]
# Weighted sum
x_centroid_weighted_raw = \
np.sum(xy_weight_centroid[:,0] * xy_weight_centroid[:,2])\
/ np.sum(xy_weight_centroid[:,2])
y_centroid_weighted_raw = \
np.sum(xy_weight_centroid[:,1] * xy_weight_centroid[:,2])\
/ np.sum(xy_weight_centroid[:,2])
# Refine the raw result of the weighted sum coordinates
if x_centroid_weighted_raw > 180.0:
x_centroid = x_centroid_weighted_raw - offset_circular
elif x_centroid_weighted_raw < -180.0:
x_centroid = x_centroid_weighted_raw + offset_circular
else:
x_centroid = x_centroid_weighted_raw
if y_centroid_weighted_raw > 180.0:
y_centroid = y_centroid_weighted_raw - offset_circular
elif y_centroid_weighted_raw < -180.0:
y_centroid = y_centroid_weighted_raw + offset_circular
else:
y_centroid = y_centroid_weighted_raw

Once we decide the proper EPSG, I think the correct boundary of the burst can be calculated by taking care of the multiple polygons (or using shapely functions as you did here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a very pleasing figure haha. how'd you make it?

I think that even though it would be terrible to warp normal points from all over the globe into UTM 1N, doing the warp -> centroid -> inverse warp shouldn't distort it too much? I'm gonna draw myself a figure since i'm unsure, but i'll double check and compare to the EPSG codes from the current version of your DB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well i've confused myself, since it seems like the warping might not necessarily get the right centroid if it's a really nonlinear bad warping... but I'm not sure which (if any) bursts it messed it up for?

burst_id_jpl,EPSG,epsg
t001_000110_iw1,32631,32630
t001_000111_iw1,32631,32630
t001_000112_iw1,32631,32630
t001_000113_iw1,32631,32630

import pandas as pd
import sqlite3
import os

os.chdir("/home/staniewi/dev")
db1 = "burst_map_IW_000001_375887.OPERA-JPL.20220818_113416.sqlite3" # made from original version
db2 = "burst_map_margin3000.sqlite3"        # from new version

with sqlite3.connect(db1) as con:
    df1 = pd.read_sql("select * from burst_id_map", con)
with sqlite3.connect(db2) as con:
    df2 = pd.read_sql("select * from burst_id_map", con)

mismatches = df1.EPSG != df2.epsg

print(df1.shape, df2.shape)  # (1127661, 13) (1127661, 13)
print(mismatches.sum())      # 104617


df_merged = pd.merge(df1.loc[mismatches, ['burst_id_jpl','EPSG', ]], df2.loc[mismatches, ['epsg', 'burst_id_jpl']], on="burst_id_jpl")
df_merged.to_csv("~/dev/burst_diffs.csv", index=False)

Looking at the mismatches, a lot are just off by one, which might be either bursts on the borderline of two UTM zones getting a slightly different centroid, or might be your keeping adjacent subswaths in the same UTM zone:

(burst id, then your version of the EPSG, then my script's output)

burst_id_jpl,EPSG,epsg
t001_000110_iw1,32631,32630
t001_000111_iw1,32631,32630
t001_000112_iw1,32631,32630
... (lots more like this)
t001_000448_iw1,3413,32627
t001_000449_iw1,3413,32627
(more on the borderline between polar and non-polar regions)

The biggest differences i've seen look like this:

t001_000669_iw2,32612,32601
t001_000669_iw3,32612,32660
t001_000670_iw1,32614,32601
t001_000670_iw2,32614,32601

where these bursts on the anti-meridian are off due to what you pointed out in #1 .

Copy link
Collaborator

@seongsujeong seongsujeong Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you are taking a look at the old version of the database. In the updated algorithm, both your and my db have EPSG 32601 for t001_000670_iw1. I'll upload it to the server and send you the link. Or you can pull the changes, and run it on your side.

Regarding the EPSG numbers whose difference is 1, or UTM / PS: I think that is from the difference in the detail. I determine the EPSG from IW2, and assign the same projection for the corresponding bursts in IW1 and IW3. That can cause difference in the projection between yours and mine, which can be especially found around the "projection border"

Therefore I would not be surprised if there are lots of IW1 or IW3 bursts with different EPSG. In that case, comparing the projection only for IW2 bursts would make sense. I've calculated the EPSGs from your and my implementation, and the only occasion that were different was t037_077712_iw2 My DB had 32601, and yours was 32660. That difference is acceptable because they are in fact right next to each other in geographic coordinates.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also regarding the distortion of polygon bursts on UTM1N (in the figure above): I was bit confused when thinking about it. I've checked that you apply "warp-centroid-warp_back" only for Antimeridian case i.e. multiple polygon in feature's geometry. Using UTM 1N would be okay in that case. Sorry for the false alarm.

make_db.py Show resolved Hide resolved
make_db.py Show resolved Hide resolved
@scottstanie
Copy link
Contributor Author

My major concern is that I wanted to have the neighboring bursts in the three subswath to have the same EPSG. If I am understanding this code correctly, however, it looks like this PR calculated the EPSG burst by burst.

We can change that, but what would be the benefit of having IW1,2,3 in the same EPSG? I thought the two options we were considering were

  1. geocode burst by burst in the most appropriate UTM projection by finding the burst centroid.
  2. create the frame definition in advance so that all ~30 bursts in one frame have the same EPSG, and decide the frame EPSG based on the frame center.

if we're doing things burst-wise, then it seems like (as an example) burst t001_000001_iw1 is no more related to t001_000001_iw2 than it is to t001_000002_iw1 (the next one along track), but was there something you or @hfattahi thought of that created a good reason to keep iw1/2/3 together?

@seongsujeong
Copy link
Collaborator

seongsujeong commented Dec 7, 2022

My major concern is that I wanted to have the neighboring bursts in the three subswath to have the same EPSG. If I am understanding this code correctly, however, it looks like this PR calculated the EPSG burst by burst.

We can change that, but what would be the benefit of having IW1,2,3 in the same EPSG? I thought the two options we were considering were

  1. geocode burst by burst in the most appropriate UTM projection by finding the burst centroid.
  2. create the frame definition in advance so that all ~30 bursts in one frame have the same EPSG, and decide the frame EPSG based on the frame center.

if we're doing things burst-wise, then it seems like (as an example) burst t001_000001_iw1 is no more related to t001_000001_iw2 than it is to t001_000002_iw1 (the next one along track), but was there something you or @hfattahi thought of that created a good reason to keep iw1/2/3 together?

I think we have to ultimately do the 2. i.e. frame-based EPSG definition, as we have discussed in the ADT meeting. But I'm not sure if we have the frame definition now. Please let me know if we do.
Until we have it, I think we have to stick with the EPSG designation strategy that is more stitch-friendly.

For example let's think about this case (actual ESPG might differ)

burst_id_jpl EPSG
t001_000001_iw1 32614
t001_000001_iw2 32614
t001_000001_iw3 32615
t002_000001_iw1 32615

The first three bursts are supposed to be stitch-able because they are from the same orbit. However, if we do not have the same projection for them above, it would be very tricky to do mosaic them because we need to resample the third one.

Would we stitch the last two bursts in the table? no. because they are not from the same orbit. Therefore I think it makes more sense to determine the projection from "burst triplet", rather than each individual bursts' centroids.

@scottstanie
Copy link
Contributor Author

this was closed in favor of #8 , which implements a superset of the features here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential bug on build_database.wkt2extent()
2 participants