Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Nondeterminism in RTC Download via Data Subscriber #877

Open
niarenaw opened this issue Jun 11, 2024 · 3 comments
Open

[Bug]: Nondeterminism in RTC Download via Data Subscriber #877

niarenaw opened this issue Jun 11, 2024 · 3 comments
Assignees
Labels
bug Something isn't working needs triage Issue that requires triage pcm.r03-dswx-s1 PCM Release 3 - DSWx-S1

Comments

@niarenaw
Copy link

Checked for duplicates

Yes - I've already checked

Describe the bug

In adressing nasa/opera-sds#48, some unexpected behavior was observed using the data subscriber script. Several thousand RTC products were generated then delivered to ASF UAT. The goal was to generate DSWx-S1 products for the tile/date pairs listed in ticket above. The following commands were run (in the listed order):

FIRST COMMAND: Trigger all RTC downloads given delivery time range

python3 ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query \
                -c OPERA_L2_RTC-S1_V1 \
                --job-queue=opera-job_worker-rtc_data_download \
                --chunk-size 1 \
                --release-version=$RELEASE \
                --endpoint=UAT \
                --coverage-target=1 \
		--start-date=$START \
		--end-date=$END \
		--processing-mode=historical \

SECOND COMMAND: Trigger all RTC downloads given delivery time range + tile

python3 ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query \
                -c OPERA_L2_RTC-S1_V1 \
                --job-queue=opera-job_worker-rtc_data_download \
                --chunk-size 1 \
                --release-version=$RELEASE \
                --endpoint=UAT \
                --coverage-target=1 \
		--start-date=$START \
		--end-date=$END \
		--include-regions=$TILE \
		--processing-mode=historical \

THIRD COMMAND: Trigger all RTC downloads given temporal time range + tile

python3 ~/mozart/ops/opera-pcm/data_subscriber/daac_data_subscriber.py query \
                -c OPERA_L2_RTC-S1_V1 \
                --job-queue=opera-job_worker-rtc_data_download \
                --chunk-size 1 \
                --release-version=$RELEASE \
                --endpoint=UAT \
                --coverage-target=1 \
		--start-date=$START \
		--end-date=$END \
		--include-regions=$TILE \
		--use-temporal \
		--processing-mode=historical \

Despite running the following for many tile/date combos, certain products that we expected to make were never generated. One such pair is: (17RNH, 2022-11-10). I would also expect each command to cause all submitted jobs to dedupe as each triggers a subset of the previous, but new jobs were always kicked off. After all jobs were run, the only way to trigger certain missing products was to run the data subscriber with in a loop with --native-id=RTC for each RTC that was derived from the SLC that covered the MGRS tile on the missing date.

What did you expect?

  1. Would expect the first command to trigger all RTC downloads with any coverage since --coverage-target=1
  2. Would expect all jobs to dedupe (or at least trigger missing products)
  3. Would expect all jobs to dedupe (or at least trigger missing products)

Reproducible steps

1. Generate and deliver a decent size dataset of RTC products
2. Run the commands above
3. Confirm certain products were not delivered

Environment

- Processing was done on PST using release 3.0.0-er.3.0
@niarenaw niarenaw added bug Something isn't working needs triage Issue that requires triage labels Jun 11, 2024
@hhlee445 hhlee445 added the pcm.r03-dswx-s1 PCM Release 3 - DSWx-S1 label Jun 11, 2024
@hhlee445
Copy link
Contributor

@niarenaw can you also provide specific time range and geojson file?

@niarenaw
Copy link
Author

  • time range for 1 and 2 = 2024-06-06T00:00:00Z to 2024-06-07T23:59:59Z (delivery time)
  • time range for 3 = 2022-11-10T00:00:00Z to 2022-11-10T23:59:59Z (temporal/acquisiton time)
  • geojson: { "type": "FeatureCollection", "features": [{ "type": "Feature", "properties": { "type": "S2", "identifier": "17RNH" }, "geometry": { "type": "MultiPolygon", "coordinates": [ [ [ [ -81.0001987094, 25.3167339867, 0.0 ], [ -79.909362219, 25.3126993615, 0.0 ], [ -79.9180080699, 24.3212402892, 0.0 ], [ -81.0001971338, 24.3250964058, 0.0 ], [ -81.0001987094, 25.3167339867, 0.0 ] ] ] ] } }] }

@chrisjrd
Copy link
Contributor

chrisjrd commented Jul 9, 2024

dedupe logic and grace period logic was updated after this ticket was originally filed. has this issue been reproduced since?

PR with dedupe logic update: #884

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Issue that requires triage pcm.r03-dswx-s1 PCM Release 3 - DSWx-S1
Projects
None yet
Development

No branches or pull requests

3 participants