Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate MI tags [feature request] #10

Open
sjackman opened this issue Apr 19, 2017 · 4 comments
Open

Generate MI tags [feature request] #10

sjackman opened this issue Apr 19, 2017 · 4 comments

Comments

@sjackman
Copy link
Contributor

I've aligned reads to a de novo assembly using BWA-MEM rather than Long Ranger align. The BWA alignments have the BX tag but no MI tag. Would you consider implementing a module to group together nearby reads into molecules and adding MI:i tags? No worries at all if you consider this feature outside the scope of bxtools.

lariat (aka longranger align) groups together reads that are within 50 kbp of each other. That's a reasonable default value, but I'd find it very helpful for this to be a configurable parameter. I'd like to set it to something more conservative like perhaps 5 kbp.

@sjackman
Copy link
Contributor Author

sjackman commented May 10, 2017

Any thoughts on this feature request? My motivation is that longranger wgs takes ~3 days to run (for this particular data set), and bwa mem takes ~1.5 hours to run (on the same data set), so I often run BWA first while longranger wgs is running. I'd like to calculate some stats on molecule sizes using bxtools mol, which needs MI:i tags.

@walaj
Copy link
Owner

walaj commented May 11, 2017

Hi Shaun, apologies a bit slow here. I think this is reasonable, and I started on this a few weeks back but got called away by other things. If I understand correctly, this would basically run along the BAM and label BX tags into an MI (or something else maybe) tag if they are within X bp of each other. I suppose then that right after X bp, any other BX of the same value would then be assigned a new MI tag up until the reader has progressed another X bp.

Is that the design you were thinking of?

@sjackman
Copy link
Contributor Author

No apologies needed. Thanks for your response, Jeremiah. Yep, that's what I'm thinking. It should work on a BAM file sorted by target position. It can keep a priority queue of all the barcodes that are current (seen within the last X bp) and dump them as they expire, and dump them when transitioning to a new reference sequence. The default value for X for Lariat is 50 kbp. I think it's perhaps too large, but it's a reasonable default to match behaviour. The default tag should be MI but configurable by the user. If the MI tag already exists, perhaps it would be useful to move that tag to a different tag (XM perhaps), but that's definitely a third-teer feature request.

Write now I'm converting BAM to TSV and doing this in R. Here's the code:
https://github.com/sjackman/picea-glauca-organelles/blob/master/molecules.rmd#group-reads-into-molecules

@sjackman
Copy link
Contributor Author

sjackman commented Nov 2, 2018

This feature is now implemented in the utility tigmint-molecule. See https://github.com/bcgsc/tigmint
The input BAM file must be sorted by barcode and position using samtools sort -tBX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants