bwa-meth and samtools flagstat #42

anusurendra · 2017-03-17T22:11:56Z

Hi,
I used bwa-meth to align rust infected wheat samples to the rust genome. However, when I ran samtools flagstat I get the follow output:

22806328 + 4861610 in total (QC-passed reads + QC-failed reads)
154024 + 1050636 secondary
0 + 0 supplementary
0 + 0 duplicates
22634029 + 4846318 mapped (99.24% : 99.69%)
22652304 + 3810974 paired in sequencing
11326152 + 1905487 read1
11326152 + 1905487 read2
22435338 + 0 properly paired (99.04% : 0.00%)
22467696 + 3780390 with itself and mate mapped
12309 + 15292 singletons (0.05% : 0.40%)
30982 + 232010 with mate mapped to a different chr
13721 + 50746 with mate mapped to a different chr (mapQ>=5)

I didn't know BWA-meth gave secondary alignments. Also, the total number of QC-passed reads + QC-failed reads (27480347) is greater than the total of the pair-end reads (26463278).

The text was updated successfully, but these errors were encountered:

brentp · 2017-03-18T01:32:39Z

Hi, what is your question?

anusurendra · 2017-03-20T21:02:42Z

Apologizes Brent for not being clear,

When I was looking at the flagstat, I just found that the secondary alignment values weird. It seems that there are large portion have more than one alignment. But it may have to do with the fact that the sample has both wheat and rust. Thanks again for your patience and help. As a side note, I found BWA-meth is great and has better alignments than Bismark!!!

nchernia · 2018-02-07T00:34:18Z

I think this is due to the issue I just posted, #50

BWA sets as QC-failed when reads are soft-clipped, see https://bioinformatics.stackexchange.com/questions/2988/interpreting-0x200-flag-in-bwa-mem-alignments

I ran BWA without the -p flag but with the other flags and had no QC failed alignments.

nchernia · 2018-02-07T00:59:28Z

Actually it seems this is due to lines 361-390 in the master script.

tuuba06 · 2024-01-18T20:46:28Z

Hi,
For DNA methylation analysis, I used bwa-meth to align reads from poppy samples to the poppy genome. When I run samtools flagstat I get the following output. There are fairly high QC-failed reads. How can I reduce QC-failed reads? Thanks. @brentp

36809078 + 31632003 in total (QC-passed reads + QC-failed reads)
26158 + 118693 secondary
0 + 0 supplementary
0 + 0 duplicates
34068600 + 31567323 mapped (92.55% : 99.80%)
36782920 + 31513310 paired in sequencing
18391460 + 15756655 read1
18391460 + 15756655 read2
33895914 + 0 properly paired (92.15% : 0.00%)
34013748 + 31383950 with itself and mate mapped
28694 + 64680 singletons (0.08% : 0.21%)
66884 + 108786 with mate mapped to a different chr
18024 + 44290 with mate mapped to a different chr (mapQ>=5)

brentp · 2024-01-19T09:09:08Z

Hi, you can try the --do-not-penalize-chimeras argument.

tuuba06 · 2024-01-22T21:36:41Z

Hİ,
Thank you for your suggestion. I can try the --do-not-penalize-chimeras argument. QC-failed reads reduced. I am adding the same sample samtools flagstat output below. I am doing analysis with multiple samples. QC fail reads were quite high on 3 of my samples compared to my other samples.
Does the --do-not-penalize-chimeras argument affect the accuracy of my data? What effect does it have? Should I use the --do-not-penalize-chimeras argument in my other samples as well? Can you advice? @brentp
Thanks

68441081 + 0 in total (QC-passed reads + QC-failed reads)
144851 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
65635923 + 0 mapped (95.90% : N/A)
68296230 + 0 paired in sequencing
34148115 + 0 read1
34148115 + 0 read2
65152402 + 0 properly paired (95.40% : N/A)
65397698 + 0 with itself and mate mapped
93374 + 0 singletons (0.14% : N/A)
175670 + 0 with mate mapped to a different chr
68445 + 0 with mate mapped to a different chr (mapQ>=5)

brentp · 2024-01-23T07:43:52Z

Hi, without --do-not-penalize-chimeras, then bwameth does this:

bwa-meth/bwameth.py

Lines 436 to 444 in 7c8e1eb

    
           if not do_not_penalize_chimeras: 
        
           # here we have a heuristic that if the longest match is not 44% of the 
        
           # sequence length, we mark it as failed QC and un-pair it. At the end 
        
           # of the loop we set all members of this pair to be unmapped 
        
               if aln.longest_match() < (len(orig_seq) * 0.44): 
        
                   aln.flag |= 0x200  # fail qc 
        
                   aln.flag &= (~0x2) # un-pair 
        
                   aln.mapq = min(int(aln.mapq), 1)

so it find reads where there is a lot of soft-clipping. This can happen with BS-Seq and usually leads to lower quality results. You'll have to decide for yourself when to use the argument.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bwa-meth and samtools flagstat #42

bwa-meth and samtools flagstat #42

anusurendra commented Mar 17, 2017

brentp commented Mar 18, 2017

anusurendra commented Mar 20, 2017 •

edited

Loading

nchernia commented Feb 7, 2018

nchernia commented Feb 7, 2018

tuuba06 commented Jan 18, 2024

brentp commented Jan 19, 2024

tuuba06 commented Jan 22, 2024

brentp commented Jan 23, 2024

bwa-meth and samtools flagstat #42

bwa-meth and samtools flagstat #42

Comments

anusurendra commented Mar 17, 2017

brentp commented Mar 18, 2017

anusurendra commented Mar 20, 2017 • edited Loading

nchernia commented Feb 7, 2018

nchernia commented Feb 7, 2018

tuuba06 commented Jan 18, 2024

brentp commented Jan 19, 2024

tuuba06 commented Jan 22, 2024

brentp commented Jan 23, 2024

anusurendra commented Mar 20, 2017 •

edited

Loading