Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vv ensembl dev susmi #615

Open
wants to merge 105 commits into
base: ensembl_update_2024
Choose a base branch
from

Conversation

Peter-J-Freeman
Copy link
Collaborator

No description provided.

Peter J. Freeman and others added 30 commits July 7, 2022 09:34
commits that tackle what I saw for issue #387. It seems the reuested …
Hide more recent version and not part of genome build warnings for irrelevant transcripts
John-F-Wagstaff and others added 30 commits September 15, 2024 21:24
The underlying seq fetching code is 0 based but much of the overlying
code in the expanded repeat handling acts as if it is 1 based, fix
this.
This test set is conceptually the same as the existing C set, but twice
as long and done with transcripts where there is a pair of n and c
versions with the same within transcript sequence, and the same from
transcript start coordinates, for the section in question. This allows
us to test the offset c based coordinates against the more raw 1 based
non offset n data. This test set also includes more n+offset type
coordinates as opposed to the mainly n-offset in the original set along
with more mutli-base repeats, which adds to the test coverage.
Both a few basic tests and a reverse genome -> transcript set, which is
equivalent to a genomic version of those C tests with successful genomic
mapping.
We used to need to check the transcript type, but we now handle all ref
type within the expanded repeat code, so this feature is no longer used.
Although the variant position is still messed with we no longer convert
the variant into a pseudo-g type and back when we get an intronic
coordinate. This patch also cleans up the use of offset, without
specifiers, in variable and function names, which is ambiguous. In the
process we end up fixing a n type to c type coordinate conversion bug
and preparing for addition of 3' UTR handling in the process.
intronic_or_utr would sometimes store the UTR status, but this was
actually unused, instead it was mostly used to store the transcript when
an intron was detected, as part of the handling for the now fixed abuse
of the reference variable.
The regular expression fixed missing bad characters in a repeat if one
good repeat character was included.
Also tidy up some logic leftover from previous c<->n mapping methods
now that we centralise it into separate functions.  We also slightly
adjust function naming and input logic from get_range_from_single to
get_range_from_single_or_start to match actual usage.
Also adjust test for get_range_from_single_pos to match the new name of
get_range_from_single_or_start_pos.
We do not use this function any more in the current usage pattern.
Also some use of startswith instead of re.match in ref type checking.
Updtes to the expanded repeat code for simple repeats only so far
…e and intrins in r. descriptions as referred to in #545
…ing and HGNC genes with no transcript info openvar/rest_variantValidator#186 and also handle the longer deletions in #651
…nate alignments in patches vs the primary assembly. Issue #657
…3 prime UTRs in uncertain positions for the LOVD paper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants