Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Obtaining realigned BAM from DeepVariant #939

Open
DevangThakkar opened this issue Feb 20, 2025 · 2 comments
Open

Obtaining realigned BAM from DeepVariant #939

DevangThakkar opened this issue Feb 20, 2025 · 2 comments
Assignees

Comments

@DevangThakkar
Copy link

Hi there,

I've been using DeepVariant to call indels in our data and I've found that (as expected, perhaps) the performance is better if we use realignment in DeepVariant (the default). However, I've been looking at the realigned reads (using emit_realigned_reads=True) and I have a couple of questions.

  1. First, there are regions that have overlapping windows and thus overlapping BAMs. See for example positions around an indel at chr21:34792351

    chr21:34792351-34793350/
    chr21:34792035-34792710/

    Is this behavior expected?

  2. Is there a way to obtain the entire realigned BAM instead of just the individual segments? I could go ahead and merge these together with some deduplication and try to substitute these back into the original BAM but these reads don't have the tags that the input file does so that would be very messy. It would be really helpful to have the entire realigned BAM somehow.

Thank you!

@DevangThakkar
Copy link
Author

It seems like there are regular 1000bp windows here but also some that are not. If it helps, these are the folders around the region I'm looking at:

chr21:34790351-34791350:	realigned_reads.bam
chr21:34791351-34792350:	realigned_reads.bam
chr21:34791969-34792440:	graph.dot
chr21:34791969-34792751:	graph.dot
chr21:34792035-34792710:	realigned_reads.bam
chr21:34792251-34792751:	graph.dot
chr21:34792351-34793350:	realigned_reads.bam
chr21:34793351-34794350:	realigned_reads.bam
chr21:34793867-34794169:	graph.dot
chr21:34794351-34795350:	realigned_reads.bam
chr21:34795351-34796350:	realigned_reads.bam
chr21:34796351-34797350:	realigned_reads.bam
chr21:34797351-34798350:	realigned_reads.bam
chr21:34798351-34799350:	realigned_reads.bam
chr21:34799012-34799449:	graph.dot
chr21:34799101-34799661:	graph.dot
chr21:34799201-34799562:	realigned_reads.bam
chr21:34799251-34799858:	graph.dot
chr21:34799351-34800350:	realigned_reads.bam

However, this does not seem to always be the case, see another random region:

chr12:122021971-122022304:	graph.dot
chr12:122021992-122022283:	realigned_reads.bam
chr12:122030500-122030980:	graph.dot
chr12:122030600-122030881:	realigned_reads.bam
chr12:122035131-122035626:	graph.dot
chr12:122035231-122035527:	realigned_reads.bam
chr12:122043741-122044239:	graph.dot
chr12:122043786-122044153:	realigned_reads.bam
chr12:122054605-122055188:	graph.dot
chr12:122054705-122055089:	realigned_reads.bam
chr12:122058898-122059362:	graph.dot
chr12:122058992-122059263:	realigned_reads.bam

@akolesnikov
Copy link
Collaborator

HI @DevangThakkar,

  1. The behavior is expected. DeepVariant creates windows for realignment by grouping potential variant positions that are within a certain distance of each other and adding padding on both sides. As a result, overlaps are possible.

  2. Unfortunately, it is not possible to output all realigned reads in a single BAM file. DeepVariant processes each 1000-base region separately and independently. Additionally, not all regions can be realigned; in those cases, the original alignment is used.

  3. Realignment is not performed for the entire 1000-base region. Instead, DeepVariant creates the smallest possible realignment windows. As a result, there may be multiple realignment windows within a single 1000-base region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants