Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(prepro): Allow nextclade metadata fields to be per_segment #2108

Merged
merged 9 commits into from
Jun 10, 2024

Conversation

anna-parker
Copy link
Contributor

@anna-parker anna-parker commented Jun 7, 2024

preview URL: https://prepro-fix.loculus.org/

Fixes

In #2100 I mentioned that there is a bug in the prepro code when processing multi-segment nextclade alignments.

When all segments are merged:
image
When the L segment is uploaded individually:
image
Basically these alignment metrics need to be annotated as per_segment.

Summary

  • The PR allows us to have metadata fields that were created by nextclade also per_segment. In order to do this the per_segment parameter must also be added as a field to the preprocessing arguments. For example:
- name: total_unknown_nucs
        type: int
        header: "Alignment states and QC metrics"
        noInput: true
        rangeSearch: true
        per_segment: true
        preprocessing:
          args: { type: int, per_segment: true }
          inputs: { input: nextclade.totalMissing }
  • I then adjusted the prepro code to use this argument when updating the metadata fields.
  • This PR also cleans up the prepro file a bit by moving some functions to another script as prepro is getting a bit too large.

Screenshot

image

@anna-parker anna-parker added the preview Triggers a deployment to argocd label Jun 7, 2024
@anna-parker anna-parker marked this pull request as ready for review June 7, 2024 14:16
@anna-parker anna-parker changed the title Prepro fix fix(prepro): Allow nextclade metadata fields to be per_segment Jun 7, 2024
@anna-parker anna-parker added the review please PR waiting for final review label Jun 10, 2024
@corneliusroemer
Copy link
Contributor

corneliusroemer commented Jun 10, 2024

This looks great! Thanks for tackling!

I've opened an issue #2115 to improve how we show per-segment metadata in a more structure way.

I've also formatted with ruff (it didn't seem to have been) and fixed mypy errors. These might have already pre-existed from the cchf merge (I haven't yet reviewed the prepro part of the code).

See commit 8d4cb27 for my changes

@corneliusroemer
Copy link
Contributor

I just realized this PR makes the metadata spec very undry - I don't think this is necessary:

The PR allows us to have metadata fields that were created by nextclade also per_segment. In order to do this the per_segment parameter must also be added as a field to the preprocessing arguments.

@corneliusroemer
Copy link
Contributor

Simplification attempt: https://github.com/loculus-project/loculus/pull/2131/files

@corneliusroemer corneliusroemer merged commit cd065c9 into main Jun 10, 2024
9 checks passed
@corneliusroemer corneliusroemer deleted the prepro_fix branch June 10, 2024 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd review please PR waiting for final review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants