Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies in overlap behavior? #14

Open
rvanheusden opened this issue Jul 1, 2019 · 5 comments
Open

Inconsistencies in overlap behavior? #14

rvanheusden opened this issue Jul 1, 2019 · 5 comments

Comments

@rvanheusden
Copy link

rvanheusden commented Jul 1, 2019

Hi there,

I was doing some testing and it seems that there are some inconsistencies in how NCLS defines overlaps?

Given the following code snippet...

import numpy as np
from ncls import NCLS

starts = np.array([0, 20])
ends = np.array([10, 30])
ids = np.array([0, 1])

tree = NCLS(starts, ends, ids)

...the following results seem inconsistent:

tree.has_overlap(-1, 0)
>>> True

tree.first_overlap_both(np.array([-1]), np.array([0]), np.array([0]))
>>> (array([], dtype=int64), array([], dtype=int64)) # no results

As shown above, though -1, 0 is reported as having an overlap (using has_overlap), the same query interval fails to find an overlapping interval in the tree (using first_overlap_both).
There also seems to be some inconsistencies regarding, perhaps, the treatment of intervals containing 0?

Given this snippet...

tree.has_overlap(-1, 0)
>>> True

tree.has_overlap(19, 20)
>>> False

...it seems odd that -1, 0 matches 0, 10 (presumably) while 19, 20 doesn't match 20, 30.

@endrebak
Copy link
Collaborator

endrebak commented Jul 2, 2019

Thanks for reporting. NCLS is not meant to be used with negative positions though. What is your use-case for that?

I guess I should add a check to see that no negative numbers are used. 19, 20 does not match 20, 30 because ends are non-inclusive in Python :)

@rvanheusden
Copy link
Author

rvanheusden commented Jul 2, 2019

Yeah, it seems that -1, 0 shouldn't be matching any intervals that start with 0.

I don't actually have a use-case for negative numbers; I'm using NCLS for biological purposes, as it seems it was somewhat intended for. However, this inconsistency simply came up in my testing. I reported it because I thought it was rather strange.

I would suggest that regardless of whether negative numbers are supported or not, has_overlap and any other overlap methods should have consistent determination of what is an overlap and what isn't.

@endrebak
Copy link
Collaborator

endrebak commented Jul 3, 2019

I completely agree and I am happy you reported it. I actually only use the vectorized functions because they are so much faster.

@rvanheusden
Copy link
Author

Huh, I guess maybe I should be using first_overlap_both and then just checking for results in the interest of speed. Although, that doesn't lend itself too well to readability.

@endrebak
Copy link
Collaborator

endrebak commented Jul 4, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants