Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derek furst/multiple components datasets #545

Merged
merged 17 commits into from
Oct 16, 2023

Conversation

DerekFurstPitt
Copy link
Contributor

Implemented datasets/components endpoint to register multiple components from a mult-assay ancestor. These entities are not
created as part of a typical entity creation process. We can't use the normal triggers because those are configured to work with a single ancestor per child dataset. Instead, all of the validation is handled within this endpoint. For activity creation and linkages to the direct ancestor, new functions were created. While, for now, we are only allowing a single parent ancestor, the functions were created such that they would accommodate multiple ancestors. The main difference between this schema function and the existing ones is that it accepts multiple child entities rather than just one. The activity created is of the type Multi-Assay Split

…d. For

now, returns uuids of the entities created.
containing both datasets. Still need to apply the same modifications and
read triggers done with create_entity. Creation action field is not
currently supported in the triggers. Need to merge in changes from
creation_action branch
…has its

support merged in. Normalized output. Added reindex call like is done on a
normal create entity request.
…eation

action from the dataset since this is set manually act activity creation
time. Replaced json_data_dict with dataset where it was put mistakenly
during validation.
Copy link
Member

@yuanzhou yuanzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DerekFurstPitt the way you handle this specialized endpoint with mixed use of schema components can be simplified and the SenNet project will appreciate this effort too. I'll describe the steps:

  1. First check the required X-Hubmap-Application header. And group all the input json validations at the beginning without using any schema_manager.validate_json_data_against_schema(). The purpose is to ensure data integrity by checking the required fields and their data values in terms of existence, size, and type...

  2. Similar to create_multiple_samples_details(), we create two set of IDs (which also validates the group_uuid comprehensively) via uuid-api using schema_manager.create_hubmap_ids() for the two component datasets. Then build two merged dicts and call schema_manager.generate_triggered_data() on each to generate all the dataset node properties via the schema.

  3. As what you've already done, generate one Activity node to be linked to the direct ancestor dataset and the two new component datasets. It's also the time to update the Activity.creation_action with the value of "Multi-Assay Split" as you did.

  4. Create a similar app neo4j query as app_neo4j_queries.create_multiple_samples() and get everything (nodes and relationships) into Neo4j, without using the schema.

  5. Add the new datasets to indices via reindex. And return the desired data structure back to ingest-api. Just like what you did.

By following the above approach, we'll have one specialized endpoint that handles the custom json input with custom validations, and one app-specific neo4j query.

Copy link

@maxsibilla maxsibilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this addresses the major points that we all discussed. I believe you also addressed all the issues @yuanzhou mentioned in the card but I will leave that to him to confirm

Copy link
Member

@yuanzhou yuanzhou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DerekFurstPitt the rework looks great! In addition to updating the comments to indicate the correct return results, can you also sync the latest main into your branch? When I tested locally with your branch, I noticed the dependency issue that had been addressed in main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants