Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-iding tool for Napa compliance #37

Merged
merged 101 commits into from
Jan 16, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
12b4117
add scripts for reiding
Oct 26, 2023
913262d
update .gitignore to exclude pycharm project files
mbthornton-lbl Oct 26, 2023
04cc8ed
initial script framework, Napa config file, and run_query method on A…
mbthornton-lbl Oct 26, 2023
1b118ba
added api_url reference
Oct 26, 2023
4ea1cf3
minimal config for reiding workflows
Oct 26, 2023
2108890
added activity record and data object creation
Oct 26, 2023
444ee0d
Add basic RuntimeUserApi client, and Napa specific configurations
mbthornton-lbl Oct 26, 2023
f06d731
Update rebuild_metagenome script - successfully calls queries:run end…
mbthornton-lbl Oct 26, 2023
4c88894
ignore lock files
mbthornton-lbl Oct 26, 2023
fa51362
Merge branch 're_iding' of https://github.com/microbiomedata/nmdc_aut…
mbthornton-lbl Oct 26, 2023
0188da5
refactor get_omics_processing_records to the API client
mbthornton-lbl Oct 26, 2023
3d2144b
update script to find legacy IDs
mbthornton-lbl Oct 26, 2023
5551b46
basic script framework - finds reads QC and downstream workflow activ…
mbthornton-lbl Oct 27, 2023
f6a7908
added write out to json for study
Oct 28, 2023
e9e7267
rename file, add database object post
Oct 28, 2023
219b007
rename file, move all file operation here
Oct 28, 2023
0481caa
fixes for #23 Add read_qc_analysis_activity_set
mbthornton-lbl Oct 31, 2023
53b8c52
added json reader
Oct 31, 2023
e06b496
started to break down workflows updating process
Oct 31, 2023
2196856
Change script output to a list of serialized Database instances
mbthornton-lbl Oct 31, 2023
127625e
Merge branch 're_iding' of https://github.com/microbiomedata/nmdc_aut…
mbthornton-lbl Oct 31, 2023
0538575
added dry run for reads qc transformation
Nov 1, 2023
f59486a
added ActivityRange for workflows
Nov 1, 2023
966d367
remove reads_based_analysis_acrtivity_set
mbthornton-lbl Nov 1, 2023
d5744fa
Merge branch 're_iding' of https://github.com/microbiomedata/nmdc_aut…
mbthornton-lbl Nov 1, 2023
ec7b528
remove unused functions
Nov 2, 2023
d3b1a30
helper bash script to rename bam
Nov 2, 2023
789463d
added operations for assembly
Nov 2, 2023
fc3b0d0
added support for assembly and readbased
Nov 2, 2023
b807215
find bam_script locally
Nov 2, 2023
8e53ff8
reformat
Nov 2, 2023
980a3a2
remove read_based_taxonomy_analysis
mbthornton-lbl Nov 2, 2023
8c4de1b
Merge branch 're_iding' of https://github.com/microbiomedata/nmdc_aut…
mbthornton-lbl Nov 2, 2023
8386780
fix file path
Nov 3, 2023
62c5dde
remove superflous debugging
Nov 3, 2023
19f1d5f
add back read_based_taxonomy_analysis_activity_set
Nov 3, 2023
2960932
change centrifuge report do type
Nov 3, 2023
b88aa57
add example record
Nov 6, 2023
d5f113d
added better logging
Nov 6, 2023
5cb1b7b
Rename script
mbthornton-lbl Nov 7, 2023
f38ee0e
update to extract data object for omics processing has_output
mbthornton-lbl Nov 7, 2023
f33b6ce
updated with reabased_taxonomy
Nov 7, 2023
b5259ac
Merge branch 're_iding' of https://github.com/microbiomedata/nmdc_aut…
Nov 7, 2023
50aad19
add single-record extract output for testing purposes
mbthornton-lbl Nov 7, 2023
1d78ce9
Merge branch 're_iding' of https://github.com/microbiomedata/nmdc_aut…
mbthornton-lbl Nov 7, 2023
bfac07b
added click argument for process_analysis_set
Nov 7, 2023
19c55f9
Merge branch 're_iding' of https://github.com/microbiomedata/nmdc_aut…
Nov 7, 2023
5da6522
update to search for orphaned data records and include in output
mbthornton-lbl Nov 7, 2023
c6eb2e8
Merge branch 're_iding' of https://github.com/microbiomedata/nmdc_aut…
mbthornton-lbl Nov 7, 2023
d3ee063
Begin refactoring
mbthornton-lbl Nov 10, 2023
fa5df3d
add omics processing update method and basic unit test
mbthornton-lbl Nov 10, 2023
360c37e
update re-id tool
mbthornton-lbl Nov 13, 2023
68cd47a
_make_new_data_object
mbthornton-lbl Nov 13, 2023
dff9726
Consolidate record extraction script to re_id_tool.py
mbthornton-lbl Nov 13, 2023
92f18b6
normalize QC vs Qc and output json for re-ided records
mbthornton-lbl Nov 13, 2023
e0634ac
update readQC specific properties
mbthornton-lbl Nov 13, 2023
1354a52
add update_metagenome_assembly_set
mbthornton-lbl Nov 13, 2023
88e7471
sketch in update read based analysis method
mbthornton-lbl Nov 13, 2023
fe3f3d9
Handle missing data_object_type
mbthornton-lbl Nov 13, 2023
c78dcb2
add data_object_type to new data objects
mbthornton-lbl Nov 13, 2023
38c2dd7
exteded file operations for computing new paths and assembly operations
Nov 13, 2023
e46924d
added file operations and fixed url names
Nov 13, 2023
0412ed0
pass data_object_type into assembly_file_operations to avoid records …
Nov 13, 2023
7497353
pass data_object_type directly to assembly_file_operations
Nov 13, 2023
5c8a4f0
Added with file operations
Nov 13, 2023
62987ae
localize data file ops and fix output serialization
mbthornton-lbl Nov 15, 2023
40e5ce5
updated output dump
mbthornton-lbl Nov 15, 2023
1a1a8ab
updated script and output
mbthornton-lbl Nov 15, 2023
c38070a
Create dryrun_associated_record_dump.json
mbthornton-lbl Nov 15, 2023
dcda603
local input files for dryrun
mbthornton-lbl Nov 15, 2023
22d0fac
fixed has_input for assembly and readbased
Nov 15, 2023
6d3d81f
updated has_inputs
Nov 15, 2023
eee1b50
Update new data object URLs
mbthornton-lbl Nov 15, 2023
f8236fc
Merge branch 're_iding' of https://github.com/microbiomedata/nmdc_aut…
mbthornton-lbl Nov 15, 2023
0b32b07
updated processed dry run output
mbthornton-lbl Nov 15, 2023
609d71c
extracted workflow records for Gs0114675 sty-11-8ft6t785
mbthornton-lbl Nov 15, 2023
7fc1ef7
delete dry-run output data files
mbthornton-lbl Nov 16, 2023
0426dd2
added iteration to activity ids
Nov 16, 2023
6b06e68
fix id iteration and name slot
Nov 17, 2023
03bff67
change Dry run outdir for test with real files
Nov 17, 2023
6d9945e
dry run reflecting files on nersc
Nov 17, 2023
836a0ca
added command to ingest records
Nov 17, 2023
ae75309
update to reflect proper versions
Nov 20, 2023
da59e12
added full stegen re-ided records
Nov 20, 2023
c13e93f
add delete method
Nov 20, 2023
722f6ce
changed versions
Nov 20, 2023
a06695d
changed versions
Nov 20, 2023
db5581a
exclude omics_processing_set from ingest
Nov 21, 2023
35cae66
add changesheet-only logic to re-id-ing ingest_records
mbthornton-lbl Nov 21, 2023
c76f63b
split changesheet to test api
mbthornton-lbl Nov 29, 2023
b4de0b5
added function to delete old records from db
Dec 7, 2023
9fb3f27
added method for /qeuries/run
Dec 7, 2023
d09597d
fixed query key for deleting functional agg record
Dec 7, 2023
e4fd027
added try-except block to catch errors
Dec 7, 2023
afb0a74
fixed functional annotation agg
Dec 19, 2023
6c26ca9
Merge branch 'main' into re_iding
mbthornton-lbl Jan 8, 2024
65fa67e
update make test to use poetry
mbthornton-lbl Jan 9, 2024
e1161d8
poetry add flake8
mbthornton-lbl Jan 9, 2024
0a08bf0
update poetry env somehow dropped pandas
mbthornton-lbl Jan 9, 2024
27c80b9
update reqs somehow dropped from poetry env
mbthornton-lbl Jan 9, 2024
e7596c7
add pytest-local-badge
mbthornton-lbl Jan 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
refactor get_omics_processing_records to the API client
  • Loading branch information
mbthornton-lbl committed Oct 26, 2023
commit 0188da5984565ca4afa50558f56c71b385af45b0
15 changes: 15 additions & 0 deletions nmdc_automation/api/nmdcapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,21 @@ def request(self, method, url_path, params_or_json_data=None):
rv.raise_for_status()
return rv

def get_omics_processing_records_for_nmdc_study(self, nmdc_study_id: str):
"""
Retrieve all OmicsProcessing records for the given NMDC study ID.
"""
url = "queries:run"
params = {"find": "omics_processing_set",
"filter": {"part_of": {"$elemMatch": {"$eq": nmdc_study_id}}}}
response = self.request("POST", url, params_or_json_data=params)
if response.status_code != 200:
raise Exception(
f"Error retrieving OmicsProcessing records for study {nmdc_study_id}"
)
omics_processing_records = response.json()["cursor"]["firstBatch"]
return omics_processing_records

def jprint(obj):
print(json.dumps(obj, indent=2))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,21 +50,12 @@ def rebuild_workflow_records(study_id: str, site_config: bool):
username=config.napa_username, password=config.napa_password,
base_url=config.napa_base_url, )

# 1. Retrieve all OmicsProcessing records for the given GOLD study ID
url = "queries:run"
params = {"find": "omics_processing_set",
"filter": {"part_of": {"$elemMatch": {"$eq": study_id}}}}
response = query_api_client.request("POST", url, params_or_json_data=params)
if response.status_code != 200:
raise Exception(
f"Error retrieving OmicsProcessing records for study {study_id}"
)
omics_processing_records = response.json()["cursor"]["firstBatch"]
# 1. Retrieve all OmicsProcessing records for the updated NMDC study ID
omics_processing_records = query_api_client.get_omics_processing_records_for_nmdc_study(study_id)
logging.info(
f"Retrieved {len(omics_processing_records)} OmicsProcessing records for study {study_id}"
)

# 2. For each OmicsProcessing record, retrieve the corresponding
# 2. For each OmicsProcessing record, retrieve the informed_by records


if __name__ == "__main__":
Expand Down