Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

43 update re iding process records to handle data objects wo data file description or url or with ID not found in the Database #46

Conversation

mbthornton-lbl
Copy link
Contributor

@mbthornton-lbl mbthornton-lbl commented Jan 31, 2024

This PR provides several updates to the commands in re_id_tool.py:

  • Check for WorkflowRecords with orphand DataObject IDs in has_output and write to a failed_record_dump outfile
  • Add option update-links to process records in order to test locally without filesystem changes
  • Handle OmicsProcessing records with no has_output values
  • Improved logging

Additionally this PR includes the results and logs for extract-records and process-records on:

  • nmdc:sty-11-33fbta56 (Spruce)
  • nmdc:sty-11-547rwq94 (EMP)
  • nmdc:sty-11-076c9980 (Luquillo)
  • nmdc:sty-11-dcqce727 (Crested Butte)

Workflows with an "orphan" data object ID (cannot be found in our DB or in any of the data_objects.json are written to a failed record dump, along with the rest of that Workflow's has_output data objects. If the failing Workflow is ReadQC, all workflows are failed.

@mbthornton-lbl mbthornton-lbl marked this pull request as ready for review February 8, 2024 16:49
@mbthornton-lbl mbthornton-lbl requested a review from aclum February 8, 2024 16:50
@mbthornton-lbl mbthornton-lbl marked this pull request as draft February 9, 2024 18:45
Copy link

@Michal-Babins Michal-Babins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added methods to deal with unliked DO's and computing compliant file path. Extended Analysis activity set. Generated new data json dumps.

@mbthornton-lbl mbthornton-lbl marked this pull request as ready for review February 13, 2024 02:10
@mbthornton-lbl mbthornton-lbl changed the title 43 update re iding process records to handle data objects wo data file description or url 43 update re iding process records to handle data objects wo data file description or url or with ID not found in the Database Feb 13, 2024
Copy link

@Michal-Babins Michal-Babins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented method to get has_input for legacy id, add method to fail omics processing record if DO is orphaned, added check for readbased analysis activity type.

@mbthornton-lbl mbthornton-lbl merged commit efe919b into main Feb 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
2 participants