Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stitching geocoded bursts for stack processing #14

Closed
wants to merge 12 commits into from

Conversation

vbrancat
Copy link
Contributor

@vbrancat vbrancat commented Apr 4, 2022

This PR provides a script to stitch S1-A/B bursts for stack processing i.e., the stitched bursts forming the stitched SLCs have all the same shape and can be directly interfered. To efficiently handle the list of burst IDs to stitch, the PR makes use of pandas which would be a new (but light) dependency for COMPASS.

The algorithm implements the following steps:

  1. If a list of burst IDs to stitch is not provided, it identifies a common list of bursts among all the different dates and use those burst ID for stitching. Otherwise, only the provided list of burst IDs is used for stitching.

  2. For each unique burst ID, it identifies the common burst boundary (on the ground) among the different dates and use the common burst boundary to cut the bursts at different dates. This is implemented by saving a shapely.geometry.Polygon into an ESRI Shapefile. This can be used by gdal.Warp to cut the different bursts using the cutline feature

  3. All the cut bursts for a same date are then stitched together to form the output SLCs. All the formed SLCs have the same shape as they have been formed by using the commonly identified bursts.

Assumptions
The script makes the following assumptions:

  1. The directory containing the input burst is organized by date. This should be a pretty common assumption for a stack processor

  2. Metadata are in json format and it contains info on the granule_id (i.e. filename), date, polygon, burst_id, and epsg. These should be all pretty reasonable assumptions

  3. Untested: the algorithm should equivalently work also for range/Doppler co-registered stacks of bursts, assuming that the same metadata (e.g. polygon of valid pixels) are provided.

Testing
The PR has been tested on a small stack of S1-A/B data that can be found on the aurora server at: /mnt/aurora-r0/vbrancat/data/S1/gburst_stitching/ . Below a pic of an interferogram formed by randomly selecting a reference and a secondary stitched SLC from the processed stack.

TO DO

  1. Check if gdal.Warp allows to allocate the number of threads to use and expose it in the command line option
  2. Expose in command line options X/Y resolution of stitched bursts and EPSG for reprojection
  3. Check that the algorithm applies to stitch multiband rasters (e.g. geocoded bursts with main, low, and high bands

Stitching

@vbrancat vbrancat changed the title Stitching geocoded burst for stack processing Stitching geocoded bursts for stack processing Apr 4, 2022
Comment on lines 1 to 14
import argparse
import glob
import json
import os
import time

import isce3
import journal
import pandas as pd
import shapely.wkt
from datetime import datetime
from compass.utils import helpers
from osgeo import gdal, ogr
from shapely.geometry import Polygon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import argparse
import glob
import json
import os
import time
import isce3
import journal
import pandas as pd
import shapely.wkt
from datetime import datetime
from compass.utils import helpers
from osgeo import gdal, ogr
from shapely.geometry import Polygon
import argparse
from datetime import datetime
import glob
import json
import os
import time
import isce3
import journal
from osgeo import gdal, ogr
import pandas as pd
import shapely.wkt
from shapely.geometry import Polygon
from compass.utils import helpers

PEP8 ordering and grouping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just heads-up, using isort (package to sort imports according to pep8 convention) gives me a slightly different result.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isort output appears to group from x import y incorrectly. e.g. from datetime import datetime should be with the standard library group.

return poly_int, epsg_int


def get_stitching_dict(indir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def get_stitching_dict(indir):
def get_stitching_dataframe(bursts_dir, wanted_burst_ids):

misnomer and rename for clarity
What do you think about filtering unwanted burst IDs on dataframe init vs filtering them out later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it is a good idea. We might want to filter for different reasons at different stages. One thing that I have noticed is that for bigger dataframes the filtering operation might take some time. Any hint how to making faster using pandas dataframes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made my suggestion because this block of code:

    data_dict = get_stitching_dict(indir)

    # If stitching some bursts, prune dataframe to
    # contains only the burst IDs to stitch
    if burst_list is not None:
        data_dict = prune_dataframe(data_dict,
                                    'burst_id', burst_list)

By filtering out unwanted bursts in get_stitching_dict, the call to prune_dataframe is no longer needed and other calls to prune_dataframe will work on smaller dataframes. I wager a smaller dataframe will be faster to filter. If burst_list is None, downstream behavior is not impacted since everything in indirwill be in data_dict.

My perspective of filtering is of what's present in the code of this PR. What others besides the 2 here that you have in mind?

src/compass/utils/stitching/stitch_burst.py Outdated Show resolved Hide resolved
src/compass/utils/stitching/stitch_burst.py Outdated Show resolved Hide resolved
src/compass/utils/stitching/stitch_burst.py Outdated Show resolved Hide resolved
src/compass/utils/stitching/stitch_burst.py Outdated Show resolved Hide resolved
src/compass/utils/stitching/stitch_burst.py Outdated Show resolved Hide resolved
Comment on lines 97 to 102
# Identify common burst IDs among different dates
ids2stitch = get_common_burst_ids(data_dict)

# Prune dataframe to contain only the IDs to stitch
data_dict = prune_dataframe(data_dict,
'burst_id', ids2stitch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If prune_dataframe above is removed, then get_common_burst_ids and prune_dataframe can be merged into prune_uncommon_burst_ids function? Unless you see prune_dataframe being being something to be imported and used elsewhere...

Something like:

def prune_uncommon_burst_ids(data):
    '''
    Prune dataframe based on column ID and list of value
    Parameters:
    ----------
    data: pandas.DataFrame
       dataframe that needs to be pruned

    Returns:
    -------
    data: pandas.DataFrame
       Pruned dataframe with rows in 'id_list'
    '''
    unique_dates = list(set(data['date']))

    # Initialize list of unique burst IDs
    common_id = data.burst_id[data.date == unique_dates[0]]

    for date in unique_dates:
        ids = data.burst_id[data.date == date]
        common_id = sorted(list(set(ids.tolist()) & set(common_id)))

    # remove burst IDs not common to all days
    pattern = '|'.join(common_id)
    dataf = data.loc[data['burst_id'].str.contains(pattern,
                                               case=False)]
    return dataf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The geocoded burst stack processor would reuse the functionality for pruning the dataframe based on several criteria. Therefore, I am inclined to keep this functionality as it is and reuse it elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out commonality with #18; I completely missed this. In light of this, I think prune_dataframe and get_common_burst_ids could be in their own file in src/compass/utils/dataframe_tools.py as a shared import between stack processing and stitching.

src/compass/utils/stitching/stitch_burst.py Outdated Show resolved Hide resolved
src/compass/utils/stitching/stitch_burst.py Outdated Show resolved Hide resolved
@vbrancat
Copy link
Contributor Author

vbrancat commented Feb 1, 2023

@scottstanie Maybe we can close this PR? The stitching code has been incorporated in dolphin and it is much easier :)

@scottstanie
Copy link
Contributor

Sure! There's a chance that we later come across some case where stitching geocoded SLCs leads to an easier time... but for the current effort of the displacement workflow you're right that we've got that now in dolphin 👍

@vbrancat
Copy link
Contributor Author

vbrancat commented Feb 1, 2023

Close as duplicated in the displacement workflow

@vbrancat vbrancat closed this Feb 1, 2023
@vbrancat vbrancat deleted the geo_stitching branch February 22, 2023 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants