Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Feature]: Remove superfluous data from DISP-S1 product GRQ ES #1035

Open
philipjyoon opened this issue Dec 4, 2024 · 0 comments
Open
Assignees
Labels
enhancement New feature or request needs triage Issue that requires triage pcm.r03-disp-s1

Comments

@philipjyoon
Copy link
Contributor

Checked for duplicates

Yes - I've already checked

Alternatives considered

Yes - and alternatives don't suffice

Related problems

No response

Describe the feature request

From Slack:

"Looking at the actual DISP-S1 product metadata, it's repeats the exact same information over and over again and that's why it's got 1mb worth of data. It's listing lineage and input file list (which themselves are redundant with each other) 4 times over for each .nc .png .iso.xml files etc.
Is this all really necessary? As far as I'm aware we need this metadata for two purposes: 1) enable bach-ui (do we even run bach-ui anymore? and 2) DAAC consumption."

"
Ok I've looked at: product2dataset.py opera_pge_wrapper.py send_notify_msg.py send_notify_msg.sh bach api utils source code, hysds-io.json.send_notify_msg the docker files for cnm send, GRQ rule for triggering cnm send job and I'm not seeing anything that consumes the DISP-S1 input file list, lineage list, and the localize list. I guess we store all that information for debugging purposes perhaps.
Those 3 lists repeat 4 times total; each list is about 500 lines of json. So that's roughly 6000 lines of json out of 6650 lines of json. If we get rid of 3 times out of 4 and then get rid of the localize and input file list; just keep one set of lineage then we end up with a ~1200 line json file, which is 80% reduction in data volume. That will yield 200GB end of production volume as previous 1TB. I think that would be reasonable volume.
It looks like there are 4 products in the json and we are copying-pasting the identical runconfig to each of them. So perhaps we could just get rid of that altogether and gain more space savings and may make this process a lil easier too. Clearly the run config used to generate the png or iso_xml for the same product is identical.
"

@philipjyoon philipjyoon added enhancement New feature or request needs triage Issue that requires triage labels Dec 4, 2024
@philipjyoon philipjyoon self-assigned this Dec 4, 2024
philipjyoon added a commit that referenced this issue Dec 5, 2024
…han the .nc file. Currently we are repeating the exact same metadata for .xml, .png, and so on that are uncessary and makes the DB huge
philipjyoon added a commit that referenced this issue Dec 6, 2024
philipjyoon added a commit that referenced this issue Dec 6, 2024
philipjyoon added a commit that referenced this issue Dec 6, 2024
…de is hard to test so we need to commit, deploy, and then test
philipjyoon added a commit that referenced this issue Dec 10, 2024
philipjyoon added a commit that referenced this issue Dec 10, 2024
philipjyoon added a commit that referenced this issue Dec 11, 2024
philipjyoon added a commit that referenced this issue Dec 12, 2024
…re repetitive and take up large amount of space
philipjyoon added a commit that referenced this issue Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs triage Issue that requires triage pcm.r03-disp-s1
Projects
None yet
Development

No branches or pull requests

2 participants