Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some replicas are still missing for HCA #6597

Open
nadove-ucsc opened this issue Sep 28, 2024 · 1 comment
Open

Some replicas are still missing for HCA #6597

nadove-ucsc opened this issue Sep 28, 2024 · 1 comment
Labels
- [priority] Medium groomed [process] Issue was recently looked at during backlog grooming indexer [subject] The indexer part of Azul orange [process] Done by the Azul team

Comments

@nadove-ucsc
Copy link
Contributor

nadove-ucsc commented Sep 28, 2024

Follow-up from #6582

The linked issue adds replicas that were previously missing for many HCA entities, such as donors and some protocols. However, there are still HCA entities that are not being replicated. There are two distinct cases:

  1. Entities that are linked to a file, but are not replicated because they are not tracked while traversing the links. An example is the dissociation_protocol in canned bundle aaa96233-bf27-44c7-82df-b4dc15ad4d9d.
  2. Entities that are not linked to any file in their bundle.

The solution for case 1 is to modify the TransformerVisitor class to track all linked entities it encounters, potentially consolidating all currently untracked entities in a single data structure. These entities will then be emitted as replicas by the FileTransformer.

The solution for case 2 is to modify the ProjectTransformer to emit a replica for every entity in its bundle. The hub IDs for these replicas will not include any file IDs. Duplicate replicas will be merged by the index service before any replicas are written to ElasticSearch.

This design depends on the current implementation of the linked ticket, as in #6584

@nadove-ucsc nadove-ucsc added orange [process] Done by the Azul team bug indexer [subject] The indexer part of Azul - [priority] Medium labels Sep 28, 2024
@hannes-ucsc
Copy link
Member

hannes-ucsc commented Dec 17, 2024

We need to fix this. The proposed solution is reasonable. But since replicas for HCA are not an official deliverable, we shouldn't prioritize it.

@hannes-ucsc hannes-ucsc removed their assignment Dec 17, 2024
@hannes-ucsc hannes-ucsc changed the title Replicas are still missing for HCA Some replicas are still missing for HCA Jan 6, 2025
nadove-ucsc added a commit that referenced this issue Jan 29, 2025
nadove-ucsc added a commit that referenced this issue Jan 29, 2025
nadove-ucsc added a commit that referenced this issue Jan 30, 2025
dsotirho-ucsc pushed a commit that referenced this issue Jan 30, 2025
@hannes-ucsc hannes-ucsc added the groomed [process] Issue was recently looked at during backlog grooming label Feb 7, 2025
@achave11-ucsc achave11-ucsc removed the bug label Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- [priority] Medium groomed [process] Issue was recently looked at during backlog grooming indexer [subject] The indexer part of Azul orange [process] Done by the Azul team
Projects
None yet
Development

No branches or pull requests

3 participants