Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add full provenance info to metadata.json files #411

Closed
shirey opened this issue Nov 14, 2023 · 4 comments
Closed

Add full provenance info to metadata.json files #411

shirey opened this issue Nov 14, 2023 · 4 comments
Assignees
Labels

Comments

@shirey
Copy link
Member

shirey commented Nov 14, 2023

Jonathan has requested that the full provenance metadata information be included in the metadata.json files that are currently being generated (which only include the actual data metadata that we obtain during dataset ingest).

Both for the endpoint that will generate a metadata.json file for a dataset, and for the endpoint at publication time change the contents of the metadata.json file. Include the full json response of the entity-api /entities/<dataset id> endpoint and add to that lists of the full responses (from entity-api) for all associated donors, organs and samples.

See example cypher below to obtain the uuids of these associated entities, then make calls to the entity-api /entities/<entity id> endpoint to get the json for inclusion in the list of each type. These will be included as an array with one item per entity-api response with the keys: donor, organ, sample.

Sample IDs:
match (ds:Dataset {uuid:'<dataset uuid>'})<-[*]-(s:Sample) where not s.sample_category = 'organ' return distinct s.uuid;

Organ IDs:
match (ds:Dataset {uuid:'<dataset uuid>'})<-[*]-(o:Sample) where o.sample_category = 'organ' return distinct o.uuid;

Donor IDs:
match (ds:Dataset {uuid:'<dataset uuid>'})<-[*]-(d:Donor) return distinct d.uuid;

Also, of note-- the entity-api endpoint /datasets/<dataset id>/organs could be used to get the organ information, but similar endpoints don't exist for Sample and Donor... A new endpoint will be created in entity-api to retrieve the 'organ', 'donor', and 'sample' uuids (see: hubmapconsortium/entity-api#604).

So, adding arrays to the standard entity-api endpoint json response for /entities/<dataset id> for each of donors, organs and samples is what we need written to the metadata.json file now, instead of just the dataset.ingest_metadata.metadata, as it is now.

Attached example metadata.json for fa238ec2a83302fb4af92442c3683a23, with added donors, samples and organs

fa238ec2a83302fb4af92442c3683a23-sample-metadata.json

Remember that this change must be effected in two endpoints in ingest-api:

  1. the endpoint that publishes
  2. the endpoint that creates the

Remember to make the necessary changes to the OpenAPI.yaml for ingest-api.

@ChuckKollar
Copy link
Contributor

ChuckKollar commented Jan 25, 2024

See Entity Issue: hubmapconsortium/entity-api#604 and PR: hubmapconsortium/entity-api#612

Associated PR: #388 (Issue: #375), and commit: 6244ccd

@shirey shirey changed the title Add full provenance info to metadata.tsv files Add full provenance info to metadata.json files Jan 25, 2024
@yuanzhou
Copy link
Member

yuanzhou commented Feb 7, 2024

@ChuckKollar based on our conversation, here is what I think will be a simpler and more efficient solution:

  • In entity-api, create two new endpoints /datasets/<id>/donors and /datasets/<id>/samples, implementation is similar to the existing /datasets/<id>/organs but with modified neo4j queries so we return all the entity dicts rather than just uuids. And we can add uuid filtering to only return uuids later if needed, will be very easy.

  • In the ingest-api, you'll then just need to call the three endpoints in entity-api to get back all prov info of donors, samples, and organs to be included into the final response.

@ChuckKollar
Copy link
Contributor

PR: #481

@ChuckKollar
Copy link
Contributor

ChuckKollar commented Feb 19, 2024

Move writing of metadata.json to the end of dataset publish endpoint

PR: #497

@shirey shirey closed this as completed Feb 20, 2024
@shirey shirey added this to Pitt HIVE Jun 7, 2024
@shirey shirey moved this to Done in Pitt HIVE Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

3 participants