Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(IPVC-2425): add migration to fix tx-alt-exon-pairs view #34

Merged
merged 2 commits into from
May 8, 2024

Conversation

bsgiles73
Copy link

@bsgiles73 bsgiles73 commented May 7, 2024

This PR addresses an existing issue with a view in the UTA database, tx_alt_exon_pairs_v. This view is used to find transcript and alternate accession exon pairs for the align-exons method. The goal is to find valid pairs that are missing CIGAR stings provided by uta-align.

The query finds transcript exons with the alt_aln_method="transcript" and alignments with alt_aln_method !="transcript". An issue is encountered when we deprecate an existing transcript. This happens when the cds start/end has changed or the exon structure has changed. UTA does not delete the old record, nor does it update the cds or exon values. It updates the alt_aln_method so that it is essentially hidden. Here is an example of an exon set record being updated due to a change in the exon definition of the transcript.

Transcript: exon_set_id: 343991; tx_ac: NM_001173991.2; alt_aln_method: transcript

was updated to...

Transcript: exon_set_id: 343991; tx_ac: NM_001173991.2; alt_aln_method: transcript/70b44909

because the exon structure went from 0,306;306,408;408,501;501,702;702,1306 -> 0,306;306,408;408,501;501,703;703,1306

using the following query you will see that the updated (deprecated) transcript exon set is showing up as alt_aln_methods that will be passed to align_exons. There is no need to align transcript exons from one deprecated structure to the latest. This issue can be addressed by adjusting the WHERE criteria of the view.

alt_aln_method !="transcript" -> alt_aln_method !~ "transcript"

To test this I ran the following query pre and post Alembic migration.

select a.symbol, a.gene_id, a.tx_ac, a.alt_ac, a.alt_aln_method, count(a.ord) as exon_cnt
from (
      -- query from uta.loading.align_exons, filters for exon pairs missing an exon
      --   alignment and transcripts containing a "/" character.
      select *
      from uta.tx_alt_exon_pairs_v as v
      where exon_aln_id is NULL and tx_ac !~ '/'
) as a
where a.tx_ac='NM_001173991.2'
group by a.symbol, a.gene_id, a.tx_ac, a.alt_ac, a.alt_aln_method;

BEFORE:

+-------+-------+--------------+--------------+-------------------+--------+
|symbol |gene_id|tx_ac         |alt_ac        |alt_aln_method     |exon_cnt|
+-------+-------+--------------+--------------+-------------------+--------+
|TMEM216|51259  |NM_001173991.2|AC_000143.1   |splign             |5       |
|TMEM216|51259  |NM_001173991.2|NC_000011.10  |splign             |5       |
|TMEM216|51259  |NM_001173991.2|NC_000011.10  |splign/d0e7701b    |5       |
|TMEM216|51259  |NM_001173991.2|NC_000011.9   |blat               |5       |
|TMEM216|51259  |NM_001173991.2|NC_000011.9   |splign             |5       |
|TMEM216|51259  |NM_001173991.2|NC_018922.2   |splign             |5       |
|TMEM216|51259  |NM_001173991.2|NG_032976.1   |splign             |5       |
|TMEM216|51259  |NM_001173991.2|NM_001173991.2|transcript/6bcc9051|5       |
|TMEM216|51259  |NM_001173991.2|NM_001173991.2|transcript/70b44909|5       |
+-------+-------+--------------+--------------+-------------------+--------+

AFTER:

+-------+-------+--------------+------------+---------------+--------+
|symbol |gene_id|tx_ac         |alt_ac      |alt_aln_method |exon_cnt|
+-------+-------+--------------+------------+---------------+--------+
|TMEM216|51259  |NM_001173991.2|AC_000143.1 |splign         |5       |
|TMEM216|51259  |NM_001173991.2|NC_000011.10|splign         |5       |
|TMEM216|51259  |NM_001173991.2|NC_000011.10|splign/d0e7701b|5       |
|TMEM216|51259  |NM_001173991.2|NC_000011.9 |blat           |5       |
|TMEM216|51259  |NM_001173991.2|NC_000011.9 |splign         |5       |
|TMEM216|51259  |NM_001173991.2|NC_018922.2 |splign         |5       |
|TMEM216|51259  |NM_001173991.2|NG_032976.1 |splign         |5       |
+-------+-------+--------------+------------+---------------+--------+

@bsgiles73 bsgiles73 requested review from sptaylor and nvta1209 May 7, 2024 21:46
@bsgiles73 bsgiles73 merged commit 0c76d42 into main May 8, 2024
1 check passed
@bsgiles73 bsgiles73 deleted the IPVC-2425-fix-tx-alt-exon-pairs branch May 8, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants