-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest): Small ingest fixes for multi-segmented viruses #2316
Conversation
This doesn't make sense to me. I mean I get that maybe NCBI erroneously has different taxon_id s for different segments that should be grouped, but IMO then we should come up with a consensus taxon ID as part of ingest. (Actually is there any case in our current set of organisms where we expect the taxon ID to [meaningfully/usefully] vary within an organism, irrespective of segments?) |
This is apparently an influenza "feature" (or bug depending how you look at it)- different clades of different segments would sometimes have different specialized taxon_ids - so in influenza it should be a perSegment field however for CCHF and (I think?) most other viruses this should be the same for all segments... But yes maybe not good to have this as perSegment in our default configs then... I can remove it here and try to overwrite that field for influenza configs |
I couldn't find an example, at least in CCHF of where we currently group segments with different virus tax ids. Do you have concrete evidence for influenza where segments of the same virus have different taxon_ids? I might have been wrong when I suggested this and you might have taken my word for it. I'm not confident right now that this can indeed happen. For CCHF, there are a few non-majority tax ids, but so far they are all only on a single segment:
|
I definitely saw when I was running locally that the main reason for not grouping was different taxon id and that there were a lot of different taxon Ids - I will try to find an example. |
Woops, E2E fails. Here's my promised PR to use tsv-utils again, I recommend reading the docs, it's a neat tool! |
…(70 lines + extra file) (#2329) * This might work already * fix
As this PR now no longer changes how we handle |
I was going to merge but realized that cchf is duplicated on the preview - this might be our odd edge -case but I will relaunch and check if this happens again in an hour or so. |
IIUC, setting per segment can be done by downstream users if the need
arises. For default organisms it seems unneeded.
…On Tue, Jul 23, 2024, 17:55 Anna (Anya) Parker ***@***.***> wrote:
As this PR now no longer changes how we handle taxon_id I will merge and
we can discuss further on slack if setting this to perSegment is actually
required in cases such as influenza
—
Reply to this email directly, view it on GitHub
<#2316 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AF77AQMAQKT5BCYLHL43PT3ZNZ4HZAVCNFSM6AAAAABLIQZYUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBVGYZDQMRVG4>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
duplication happened again with ebola - but as cchf reingest is working ok I think this is a side effect of: #1777 and will merge this branch |
resolves #
preview URL: https://ingest-fixes.loculus.org/
Summary
Small changes to display:
oneHeader
argument in values.yaml) to reduce busy-ness of page.Alignment grouping fix:
Small grouping changes/ Config changes
perSegment
andoneHeader
.