Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bwa-meth and samtools flagstat #42

Open
anusurendra opened this issue Mar 17, 2017 · 8 comments
Open

bwa-meth and samtools flagstat #42

anusurendra opened this issue Mar 17, 2017 · 8 comments

Comments

@anusurendra
Copy link

Hi,
I used bwa-meth to align rust infected wheat samples to the rust genome. However, when I ran samtools flagstat I get the follow output:

22806328 + 4861610 in total (QC-passed reads + QC-failed reads)
154024 + 1050636 secondary
0 + 0 supplementary
0 + 0 duplicates
22634029 + 4846318 mapped (99.24% : 99.69%)
22652304 + 3810974 paired in sequencing
11326152 + 1905487 read1
11326152 + 1905487 read2
22435338 + 0 properly paired (99.04% : 0.00%)
22467696 + 3780390 with itself and mate mapped
12309 + 15292 singletons (0.05% : 0.40%)
30982 + 232010 with mate mapped to a different chr
13721 + 50746 with mate mapped to a different chr (mapQ>=5)

I didn't know BWA-meth gave secondary alignments. Also, the total number of QC-passed reads + QC-failed reads (27480347) is greater than the total of the pair-end reads (26463278).

@brentp
Copy link
Owner

brentp commented Mar 18, 2017

Hi, what is your question?

@anusurendra
Copy link
Author

anusurendra commented Mar 20, 2017

Apologizes Brent for not being clear,

When I was looking at the flagstat, I just found that the secondary alignment values weird. It seems that there are large portion have more than one alignment. But it may have to do with the fact that the sample has both wheat and rust. Thanks again for your patience and help. As a side note, I found BWA-meth is great and has better alignments than Bismark!!!

@nchernia
Copy link
Contributor

nchernia commented Feb 7, 2018

I think this is due to the issue I just posted, #50

BWA sets as QC-failed when reads are soft-clipped, see https://bioinformatics.stackexchange.com/questions/2988/interpreting-0x200-flag-in-bwa-mem-alignments

I ran BWA without the -p flag but with the other flags and had no QC failed alignments.

@nchernia
Copy link
Contributor

nchernia commented Feb 7, 2018

Actually it seems this is due to lines 361-390 in the master script.

@tuuba06
Copy link

tuuba06 commented Jan 18, 2024

Hi,
For DNA methylation analysis, I used bwa-meth to align reads from poppy samples to the poppy genome. When I run samtools flagstat I get the following output. There are fairly high QC-failed reads. How can I reduce QC-failed reads? Thanks. @brentp

36809078 + 31632003 in total (QC-passed reads + QC-failed reads)
26158 + 118693 secondary
0 + 0 supplementary
0 + 0 duplicates
34068600 + 31567323 mapped (92.55% : 99.80%)
36782920 + 31513310 paired in sequencing
18391460 + 15756655 read1
18391460 + 15756655 read2
33895914 + 0 properly paired (92.15% : 0.00%)
34013748 + 31383950 with itself and mate mapped
28694 + 64680 singletons (0.08% : 0.21%)
66884 + 108786 with mate mapped to a different chr
18024 + 44290 with mate mapped to a different chr (mapQ>=5)

@brentp
Copy link
Owner

brentp commented Jan 19, 2024

Hi, you can try the --do-not-penalize-chimeras argument.

@tuuba06
Copy link

tuuba06 commented Jan 22, 2024

Hİ,
Thank you for your suggestion. I can try the --do-not-penalize-chimeras argument. QC-failed reads reduced. I am adding the same sample samtools flagstat output below. I am doing analysis with multiple samples. QC fail reads were quite high on 3 of my samples compared to my other samples.
Does the --do-not-penalize-chimeras argument affect the accuracy of my data? What effect does it have? Should I use the --do-not-penalize-chimeras argument in my other samples as well? Can you advice? @brentp
Thanks

68441081 + 0 in total (QC-passed reads + QC-failed reads)
144851 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
65635923 + 0 mapped (95.90% : N/A)
68296230 + 0 paired in sequencing
34148115 + 0 read1
34148115 + 0 read2
65152402 + 0 properly paired (95.40% : N/A)
65397698 + 0 with itself and mate mapped
93374 + 0 singletons (0.14% : N/A)
175670 + 0 with mate mapped to a different chr
68445 + 0 with mate mapped to a different chr (mapQ>=5)

@brentp
Copy link
Owner

brentp commented Jan 23, 2024

Hi, without --do-not-penalize-chimeras, then bwameth does this:

bwa-meth/bwameth.py

Lines 436 to 444 in 7c8e1eb

if not do_not_penalize_chimeras:
# here we have a heuristic that if the longest match is not 44% of the
# sequence length, we mark it as failed QC and un-pair it. At the end
# of the loop we set all members of this pair to be unmapped
if aln.longest_match() < (len(orig_seq) * 0.44):
aln.flag |= 0x200 # fail qc
aln.flag &= (~0x2) # un-pair
aln.mapq = min(int(aln.mapq), 1)

so it find reads where there is a lot of soft-clipping. This can happen with BS-Seq and usually leads to lower quality results. You'll have to decide for yourself when to use the argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants