Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vv ensembl dev susmi #615

Open
wants to merge 92 commits into
base: ensembl_update_2024
Choose a base branch
from

Conversation

Peter-J-Freeman
Copy link
Collaborator

No description provided.

Peter J. Freeman and others added 30 commits July 7, 2022 09:34
commits that tackle what I saw for issue #387. It seems the reuested …
Hide more recent version and not part of genome build warnings for irrelevant transcripts
Copy link

codecov bot commented Sep 10, 2024

Codecov Report

Attention: Patch coverage is 77.53744% with 135 lines in your changes missing coverage. Please review.

Please upload report for BASE (ensembl_update_2024@9156d2c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
VariantValidator/modules/vvMixinCore.py 62.74% 38 Missing ⚠️
VariantValidator/modules/complex_descriptions.py 76.28% 23 Missing ⚠️
VariantValidator/modules/gapped_mapping.py 84.15% 16 Missing ⚠️
VariantValidator/modules/format_converters.py 74.57% 15 Missing ⚠️
VariantValidator/modules/utils.py 72.50% 11 Missing ⚠️
VariantValidator/modules/mappers.py 67.85% 9 Missing ⚠️
VariantValidator/modules/vvMixinConverters.py 89.33% 8 Missing ⚠️
VariantValidator/modules/vvDatabase.py 12.50% 7 Missing ⚠️
VariantValidator/modules/hgvs_utils.py 79.16% 5 Missing ⚠️
VariantValidator/modules/use_checking.py 91.17% 3 Missing ⚠️
Additional details and impacted files
@@                  Coverage Diff                   @@
##             ensembl_update_2024     #615   +/-   ##
======================================================
  Coverage                       ?   74.13%           
======================================================
  Files                          ?       30           
  Lines                          ?    11088           
  Branches                       ?        0           
======================================================
  Hits                           ?     8220           
  Misses                         ?     2868           
  Partials                       ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Peter-J-Freeman and others added 23 commits September 10, 2024 15:39
Sometimes the normalising variant mapping process trims the returned
variant which can cause issues. This only changes the mapper for
checking the content of within intron spans, so that the result matches
the requested coordinates, for now, we may change more later.
This fixes at least two bugs, the more fundamental of which is that the
original function did not actually directly check the exon boundaries,
just ran a mapping and looked for changes. This should be quicker, fail
even when the normal mapping functions would give a pass, (which seems
to happen sometimes even with bad exon boundaries) and only fail for the
specific problems that we want to check for.
There where strand and '+' based offset issues for intronic variant
locations in expanded repeats when re-building the repeat span from the
initial start point. This code switches to using the full repeat and
taking the correct end, based on strand, for the start coordinates,
fixing these issues.
+ some preparatory cleanup ready for later additions
The underlying seq fetching code is 0 based but much of the overlying
code in the expanded repeat handling acts as if it is 1 based, fix
this.
This test set is conceptually the same as the existing C set, but twice
as long and done with transcripts where there is a pair of n and c
versions with the same within transcript sequence, and the same from
transcript start coordinates, for the section in question. This allows
us to test the offset c based coordinates against the more raw 1 based
non offset n data. This test set also includes more n+offset type
coordinates as opposed to the mainly n-offset in the original set along
with more mutli-base repeats, which adds to the test coverage.
Both a few basic tests and a reverse genome -> transcript set, which is
equivalent to a genomic version of those C tests with successful genomic
mapping.
We used to need to check the transcript type, but we now handle all ref
type within the expanded repeat code, so this feature is no longer used.
Although the variant position is still messed with we no longer convert
the variant into a pseudo-g type and back when we get an intronic
coordinate. This patch also cleans up the use of offset, without
specifiers, in variable and function names, which is ambiguous. In the
process we end up fixing a n type to c type coordinate conversion bug
and preparing for addition of 3' UTR handling in the process.
intronic_or_utr would sometimes store the UTR status, but this was
actually unused, instead it was mostly used to store the transcript when
an intron was detected, as part of the handling for the now fixed abuse
of the reference variable.
The regular expression fixed missing bad characters in a repeat if one
good repeat character was included.
Also tidy up some logic leftover from previous c<->n mapping methods
now that we centralise it into separate functions.  We also slightly
adjust function naming and input logic from get_range_from_single to
get_range_from_single_or_start to match actual usage.
Also adjust test for get_range_from_single_pos to match the new name of
get_range_from_single_or_start_pos.
We do not use this function any more in the current usage pattern.
Also some use of startswith instead of re.match in ref type checking.
Updtes to the expanded repeat code for simple repeats only so far
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants