Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds optional arguments to report k-best alignments with a NH tag #499

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

milkschen
Copy link

@milkschen milkschen commented Dec 25, 2024

Design

This PR describes two new input, optional arguments for bowtie2-align:

  1. --kbest. This argument is expected to be used along with the -k alignment mode to report up to k alignments, with a constraint that all reported alignments share the same alignment score with the primary alignment.
  2. --show-nh-tag. When this is set, shows the NH:i tag to report the number of alignments of a query. This is particularly useful in the -k/-a modes.

The changes should not change the default bowtie2-align behaviors, at least in my test runs.

Test case

FASTQ (two tRNA sequences):

>TRL-CAA6-1 Homo sapiens (human) tRNA-Leu (anticodon CAA) 6-1 (TRL-CAA6-1) URS00006DCB39_9606
GTCAGGATGGCCGAGCAGTCTTAAGGCGCTGCGTTCAAATCGCACCCTCCGCTGGAGGCGTGGGTTCGAATCCCACTTTTGACA
>TRS-AGA1-1 Homo sapiens (human) tRNA-Ser (anticodon AGA) 1-1 (TRS-AGA1-1) URS00006C14B2_9606
GTAGTCGTGGCCGAGTGGTTAAGGCGATGGACTAGAAATCCATTGGGGTTTCCCCGCGCAGGTTCGAATCCTGCCGACTACG

Resulting SAM mapped to chm13v2 with -k 3 (changes in this PR would not affect this run):

TRL-CAA6-1	16	chr1	160874498	1	84M	*	0	0	TGTCAAAAGTGGGATTCGAACCCACGCCTCCAGCGGAGGGTGCGATTTGAACGCAGCGCCTTAAGACTGCTCGGCCATCCTGAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:0	XS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:84	YT:Z:UU
TRL-CAA6-1	272	chr1	160956287	255	84M	*	0	0	TGTCAAAAGTGGGATTCGAACCCACGCCTCCAGCGGAGGGTGCGATTTGAACGCAGCGCCTTAAGACTGCTCGGCCATCCTGAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:0	XS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:84	YT:Z:UU
TRL-CAA6-1	272	chr16	63096452	255	62M1I21M	*	0	0	TGTCAAAAGTGGGATTCGAACCCACGCCTCCAGCGGAGGGTGCGATTTGAACGCAGCGCCTTAAGACTGCTCGGCCATCCTGAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:-50	XS:i:0	XN:i:0	XM:i:7	XO:i:1	XG:i:1	NM:i:8	MD:Z:5G27G4A0C5C0C19C16	YT:Z:UU
TRS-AGA1-1	16	chr6	27411350	30	82M	*	0	0	CGTAGTCGGCAGGATTCGAACCTGCGCGGGGAAACCCCAATGGATTTCTAGTCCATCGCCTTAACCACTCGGCCACGACTAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:0	XS:i:-6	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:82	YT:Z:UU
TRS-AGA1-1	272	chr8	96394913	255	82M	*	0	0	CGTAGTCGGCAGGATTCGAACCTGCGCGGGGAAACCCCAATGGATTTCTAGTCCATCGCCTTAACCACTCGGCCACGACTAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:-6	XS:i:-6	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:32G49	YT:Z:UU
TRS-AGA1-1	272	chr17	8132840	255	82M	*	0	0	CGTAGTCGGCAGGATTCGAACCTGCGCGGGGAAACCCCAATGGATTTCTAGTCCATCGCCTTAACCACTCGGCCACGACTAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:-6	XS:i:-6	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:32G49	YT:Z:UU

Resulting SAM mapped to chm13v2 with -k 3 --kbest:

TRL-CAA6-1	16	chr1	160874498	1	84M	*	0	0	TGTCAAAAGTGGGATTCGAACCCACGCCTCCAGCGGAGGGTGCGATTTGAACGCAGCGCCTTAAGACTGCTCGGCCATCCTGAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:0	XS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:84	YT:Z:UU
TRL-CAA6-1	272	chr1	160956287	255	84M	*	0	0	TGTCAAAAGTGGGATTCGAACCCACGCCTCCAGCGGAGGGTGCGATTTGAACGCAGCGCCTTAAGACTGCTCGGCCATCCTGAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:0	XS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:84	YT:Z:UU
TRS-AGA1-1	16	chr6	27411350	30	82M	*	0	0	CGTAGTCGGCAGGATTCGAACCTGCGCGGGGAAACCCCAATGGATTTCTAGTCCATCGCCTTAACCACTCGGCCACGACTAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:0	XS:i:-6	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:82	YT:Z:UU

Resulting SAM mapped to chm13v2 with -k 3 --kbest --show-nh-tag:

TRL-CAA6-1	16	chr1	160874498	1	84M	*	0	0	TGTCAAAAGTGGGATTCGAACCCACGCCTCCAGCGGAGGGTGCGATTTGAACGCAGCGCCTTAAGACTGCTCGGCCATCCTGAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:0	XS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NH:i:2	NM:i:0	MD:Z:84	YT:Z:UU
TRL-CAA6-1	272	chr1	160956287	255	84M	*	0	0	TGTCAAAAGTGGGATTCGAACCCACGCCTCCAGCGGAGGGTGCGATTTGAACGCAGCGCCTTAAGACTGCTCGGCCATCCTGAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:0	XS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NH:i:2	NM:i:0	MD:Z:84	YT:Z:UU
TRS-AGA1-1	16	chr6	27411350	30	82M	*	0	0	CGTAGTCGGCAGGATTCGAACCTGCGCGGGGAAACCCCAATGGATTTCTAGTCCATCGCCTTAACCACTCGGCCACGACTAC	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	AS:i:0	XS:i:-6	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NH:i:1	NM:i:0	MD:Z:82	YT:Z:UU

The `NH:i` tag reports "Number of reported alignments that contain the query in the current record" per [SAM spec]( https://samtools.github.io/hts-specs/SAMtags.pdf). It is useful when multiple alignments can be reported for one query.
@milkschen milkschen marked this pull request as ready for review December 25, 2024 06:30
@milkschen milkschen changed the title Adds a --kbest argument to allow reporting k-best alignments Adds optional arguments to report k-best alignments with a NH tag Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant