-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vv ensembl dev susmi #615
Open
Peter-J-Freeman
wants to merge
92
commits into
ensembl_update_2024
Choose a base branch
from
vv_ensembl_dev_susmi
base: ensembl_update_2024
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Vv ensembl dev susmi #615
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update to vvta
commits that tackle what I saw for issue #387. It seems the reuested …
Remove bad return statement for issue https://github.com/openvar/vari…
Vv ensembl develop
Vv ensembl dev s working
Vv ensembl
Hide more recent version and not part of genome build warnings for irrelevant transcripts
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## ensembl_update_2024 #615 +/- ##
======================================================
Coverage ? 74.13%
======================================================
Files ? 30
Lines ? 11088
Branches ? 0
======================================================
Hits ? 8220
Misses ? 2868
Partials ? 0 ☔ View full report in Codecov by Sentry. |
Sometimes the normalising variant mapping process trims the returned variant which can cause issues. This only changes the mapper for checking the content of within intron spans, so that the result matches the requested coordinates, for now, we may change more later.
This fixes at least two bugs, the more fundamental of which is that the original function did not actually directly check the exon boundaries, just ran a mapping and looked for changes. This should be quicker, fail even when the normal mapping functions would give a pass, (which seems to happen sometimes even with bad exon boundaries) and only fail for the specific problems that we want to check for.
There where strand and '+' based offset issues for intronic variant locations in expanded repeats when re-building the repeat span from the initial start point. This code switches to using the full repeat and taking the correct end, based on strand, for the start coordinates, fixing these issues.
+ some preparatory cleanup ready for later additions
The underlying seq fetching code is 0 based but much of the overlying code in the expanded repeat handling acts as if it is 1 based, fix this.
This test set is conceptually the same as the existing C set, but twice as long and done with transcripts where there is a pair of n and c versions with the same within transcript sequence, and the same from transcript start coordinates, for the section in question. This allows us to test the offset c based coordinates against the more raw 1 based non offset n data. This test set also includes more n+offset type coordinates as opposed to the mainly n-offset in the original set along with more mutli-base repeats, which adds to the test coverage.
Both a few basic tests and a reverse genome -> transcript set, which is equivalent to a genomic version of those C tests with successful genomic mapping.
We used to need to check the transcript type, but we now handle all ref type within the expanded repeat code, so this feature is no longer used.
Although the variant position is still messed with we no longer convert the variant into a pseudo-g type and back when we get an intronic coordinate. This patch also cleans up the use of offset, without specifiers, in variable and function names, which is ambiguous. In the process we end up fixing a n type to c type coordinate conversion bug and preparing for addition of 3' UTR handling in the process.
intronic_or_utr would sometimes store the UTR status, but this was actually unused, instead it was mostly used to store the transcript when an intron was detected, as part of the handling for the now fixed abuse of the reference variable. The regular expression fixed missing bad characters in a repeat if one good repeat character was included.
Also tidy up some logic leftover from previous c<->n mapping methods now that we centralise it into separate functions. We also slightly adjust function naming and input logic from get_range_from_single to get_range_from_single_or_start to match actual usage.
Also adjust test for get_range_from_single_pos to match the new name of get_range_from_single_or_start_pos.
We do not use this function any more in the current usage pattern.
Also some use of startswith instead of re.match in ref type checking.
Updtes to the expanded repeat code for simple repeats only so far
Fix outstanding bugs
…iantValidator into vv_ensembl_dev_susmi
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.