-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Geocoded burst stack processor #18
Conversation
@vbrancat I don't think if you have pushed any code. Would you please check and push the code? |
@vbrancat now that the repo reorganization is merged, would you please rebase so that we can start evaluating this PR? |
@hfattahi let me try to test the branch. I will push all the updates when ready :s |
src/compass/s1_geo_stack.py
Outdated
with open(runfile_name, 'w') as rsh: | ||
path = os.path.dirname(os.path.realpath(__file__)) | ||
rsh.write( | ||
f'python {path}/geo_cslc.py {runconfig_path}\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f'python {path}/geo_cslc.py {runconfig_path}\n') | |
f'python {path}/s1_cslc.py {runconfig_path}\n') |
bursts = load_bursts(safe, orbit_path, subswath) | ||
for burst in bursts: | ||
if burst.burst_id in list(set(burst_map['burst_id'])): | ||
runconfig_path = create_runconfig(burst, safe, orbit_path, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently runconfig_path is relative path. It would be great to make it absolute path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the runconfig_path
is already an absolute_path. If you inspect one of the generated run_fil
e you can see that e.g.,
python /u/aurora0/vbrancat/COMPASS/src/compass/s1_cslc.py /mnt/aurora-r0/vbrancat/data/S1/stack_processor/Rosamond/g
eocoded//runconfigs/geo_runconfig_20220501_t64_iw3_b207.yaml
Very exciting PR. Before dig into the code, I'm trying to be a user and giving it a try. Provided two minor comments above. One nice functionality would be the start and end date. Let's assume there is a folder with all SLCs, but we want to process only certain dates. It would be nice to have that flexibility. Other relevant functionality would be allowing to process a list of specific dates. Or exclude a list of specific dates. |
with open(runfile_name, 'w') as rsh: | ||
path = os.path.dirname(os.path.realpath(__file__)) | ||
rsh.write( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently creates many run files which each of them has only one command to run. An alternative would be to have one run file with many commands at each line. The latter allows to run multiple commands in parallel for example using this. I wonder if it is preferred to have one run file with one command per line or multiple run files with one command per file?
Maybe @seongsujeong has a preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seongsujeong before I make any modification, what is your opinion with respect to the comment above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been running them in parallel like
ls run_files | xargs -P 10 sh -c 'bash run_files/$0 > logfile-$0.log'
so for now I've kept them as separate files
@hfattahi latest commit should address the following:
I should do some code clean-up (move orbits download to the reader), improve panda dataframe filtering (it is a bit slow) |
def get_orbit_dict(sensor_id, start_time, end_time, orbit_type): | ||
''' | ||
Query Copernicus GNSS API to find latest orbit file | ||
|
||
Parameters: | ||
---------- | ||
sensor_id: str | ||
Sentinel satellite identifier ('A' or 'B') | ||
start_time: datetime object | ||
Sentinel start acquisition time | ||
end_time: datetime object | ||
Sentinel end acquisition time | ||
orbit_type: str | ||
Type of orbit to download (AUX_POEORB: precise, AUX_RESORB: restituted) | ||
|
||
Returns: | ||
orbit_dict: dict | ||
Python dictionary with [orbit_name, orbit_type, download_url] | ||
''' | ||
# Check if correct orbit_type | ||
if orbit_type not in ['AUX_POEORB', 'AUX_RESORB']: | ||
err_msg = f'{orbit_type} not a valid orbit type' | ||
raise ValueError(err_msg) | ||
|
||
# Add a 30 min margin to start_time and end_time | ||
pad_start_time = start_time - datetime.timedelta(hours=0.5) | ||
pad_end_time = end_time + datetime.timedelta(hours=0.5) | ||
new_start_time = pad_start_time.strftime('%Y-%m-%dT%H:%M:%S') | ||
new_end_time = pad_end_time.strftime('%Y-%m-%dT%H:%M:%S') | ||
query_string = f"startswith(Name,'S1{sensor_id}') and substringof('{orbit_type}',Name) " \ | ||
f"and ContentDate/Start lt datetime'{new_start_time}' and ContentDate/End gt datetime'{new_end_time}'" | ||
query_params = {'$top': 1, '$orderby': 'ContentDate/Start asc', | ||
'$filter': query_string} | ||
query_response = requests.get(url=scihub_url, params=query_params, | ||
auth=(scihub_user, scihub_password)) | ||
# Parse XML tree from query response | ||
xml_tree = ElementTree.fromstring(query_response.content) | ||
# Extract w3.org URL | ||
w3_url = xml_tree.tag.split('feed')[0] | ||
|
||
# Extract orbit's name, id, url | ||
orbit_id = xml_tree.findtext( | ||
f'.//{w3_url}entry/{m_url}properties/{d_url}Id') | ||
orbit_url = f"{scihub_url}('{orbit_id}')/$value" | ||
orbit_name = xml_tree.findtext(f'./{w3_url}entry/{w3_url}title') | ||
|
||
if orbit_id is not None: | ||
orbit_dict = {'orbit_name': orbit_name, 'orbit_type': orbit_type, | ||
'orbit_url': orbit_url} | ||
else: | ||
orbit_dict = None | ||
return orbit_dict | ||
|
||
|
||
def download_orbit(output_folder, orbit_url): | ||
''' | ||
Download S1-A/B orbits | ||
|
||
Parameters: | ||
---------- | ||
output_folder: str | ||
Path to directory where to store orbits | ||
orbit_url: str | ||
Remote url of orbit file to download | ||
''' | ||
|
||
response = requests.get(url=orbit_url, auth=(scihub_user, scihub_password)) | ||
# Get header and find filename | ||
header = response.headers['content-disposition'] | ||
_, header_params = cgi.parse_header(header) | ||
# construct orbit filename | ||
orbit_filename = os.path.join(output_folder, header_params['filename']) | ||
# Save orbits | ||
open(orbit_filename, 'wb').write(response.content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually both replace with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, this will be replaced with the orbit download functionality in the s1-reader
. I will wait for that PR to be merged before applying any related modification to this PR
if burst_id is not None: | ||
burst_map = prune_dataframe(burst_map, 'burst_id', burst_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can avoid this prune_dataframe
call if generate_burst_map
takes in burst_id
and uses it to filter bursts. Like this.
if start_date is not None: | ||
burst_map = burst_map[burst_map['date'] >= start_date] | ||
if end_date is not None: | ||
burst_map = burst_map[burst_map['date'] <= end_date] | ||
|
||
# Exclude some dates if the user requires it | ||
if exclude_dates is not None: | ||
burst_map = prune_dataframe(burst_map, 'date', exclude_dates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like burst_id
, these bits of pruning could be filtered in generate_burst_map
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start and end date pruning is now occurring before the generation, so that the number of zip files to load is minimized
orbit_path = get_orbit_file_from_list(safe, | ||
glob.glob(f'{orbit_dir}/S1*')) | ||
for subswath in i_subswath: | ||
bursts = load_bursts(safe, orbit_path, subswath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Sentinel1BurstSlc
objects are stored in burst_map
then they don't need to reloaded here.
If my suggestions to filter in generate_burst_map
are accepted, then that should keep the burst_map
object not too big?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestions is implemented in #53 , and it means that the script run time is cut in half (since 95% or more of the script time is spent loading, and every zip was loaded twice)
is_split_spectrum, | ||
low_band, high_band, pol, | ||
x_spac, y_spac) | ||
date_str = str(date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line should be moved to right after line 534. Right now if the "else" triggers, the logging entry on line 552 uses date_str
, but it isn't defined.
I believe I've addressed the comments here, so moving the final implementation to #53 |
This PR adds a script for the geocoded burst stack processor.
The script works on a static stack of S1-A/B SAFE files and operates at a burst level.
Given a reference acquisition date in the stack, the script identifies the common bursts among the static stack of SAFE files and generate a runconfig and a bash file to use to run the geocoded burst workflow.
Stack of burst with the same burst IDs are geocoded on the same grid which their stitching easier.
If the orbit directory is not provided, the script will automatically download the orbit ephemerides files for each SAFE file of the stack.
TO DO: