Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlaps counted twice in "Total length (no overlaps)" calculation? #111

Open
jflot opened this issue Apr 6, 2022 · 1 comment
Open

Comments

@jflot
Copy link

jflot commented Apr 6, 2022

I am wondering about how Bandage calculates the "Total length (no overlaps)" statistics. For example here is a toy GFA with only four contigs and two links:
H VN:Z:1.0
S contig1 AAAAAAAAAA
S contig2 AAAAACCCCC
S contig3 GGGGGGGGGG
S contig4 GGGGGTTTTT
L contig1 + contig2 + 5M
L contig3 + contig4 + 5M

Each contig is 10 bp long, and the total length without overlaps should be (in my opinion) 30 bp but Bandage tells 20 bp, i.e. it seems that each overlap is counted twice. After using the "Merge all possible nodes" tool, however, the total length becomes 30 bp as expected.

Another example (with one extra contig and one extra link):
H VN:Z:1.0
S contig1 AAAAAAAAAA
S contig2 AAAAACCCCC
S contig3 GGGGGGGGGG
S contig4 GGGGGTTTTT
S contig5 GGGGGGGGGT
L contig1 + contig2 + 5M
L contig3 + contig4 + 5M
L contig3 + contig5 + 9M

Here the total length (no overlaps) returned by Bandage is 17 bp... Any cue?

@odethier-ulb
Copy link

overlap
I think that the length is correct. For instance if we take the second example, the total length without overlap is computed by summing the 'white' part of each sequence (the coloured ones are the overlaps), which gives 17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants