-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore_trivial does not work as expected #929
Comments
@frankl1 Thank you for your question and for your kind words. Here are a few observations as I work through the example:
Having said all of that, there will be extreme/exceptionally rare circumstances where one might want/need to allow self-match/trivial-match and here is a (bad, very bad, don't do this as this is not supported) solution:
which gives:
So, all of the distances are zero (please mind the limited precision) and the index for the nearest neighbor is the self-match for all subsequences. Essentially, we are "tricking"
And the answer is "no" because, behind the scenes, we have implemented some checks that compare whether
Unfortunately/fortunately, this is the expected behavior. To reiterate, self-matches/trivial-matches for self-joins are avoided unless the user tricks |
@seanlaw what a clear and nice to read reply. Thank you very much for all the clarification. I was interpreting the meaning of the parameter in the wrong way. Allow me to say that the document is not clear about the meaning Regarding self-join with Clearly, everything is working as expected 😄 , however, I thing this behaviour is misleading (it led me to opening this issue 😄), even for new users. If the parameter is overwritten under the hood, it would be useful to show a warning to the user. Something like " Additionally, I will suggest to add the exclusion zone as parameter that the user could change and use Once more time, thank you for the quick and in-depth reply. |
Your feedback is well received. I've opened a new issue to track this but a PR is also welcome 😄
I have misspoken. If something is in fact changed underneath the hood, a warning is presented. So, both:
produce the exact same results (
See this code for more details: Lines 3638 to 3692 in 58ccc69
Note that version 1.9.2 may be too outdated and these warnings/changes may not exist. Please try the latest version to confirm.
Hmmm, maybe I'm not understanding your point. While it may be true that two identical time series would have identical top-k matrix profiles, it seems like a really roundabout way to confirm that the two time series are the same. Instead, one could/should do something like: Lines 471 to 498 in 58ccc69
Note that this check is much, much cheaper than computing the matrix profile, especially as the length of your time series gets larger.
A long, long time ago, the exclusion zone was a function parameter. However, rather than have it be a local function parameter, we decided to move this to a global configuration parameter via:
And then, in ALL functions where the exclusion zone is relevant, the exclusion zone would be computed as:
The reason why we decided to do it this way was because, you guessed it, we wanted to prevent the user from shooting themselves in the foot. One of the most natural things that one might do is:
Originally (i.e., a long, long time ago), you might have done:
However, by omitting
And all functions are "aware" and will use the same exclusion zone accordingly. 99.9% of the time, users will never change the exclusion zone. For the minority of cases, changing the exclusion zone should be a very explicit/deliberate action. We would rather users ask than spend all of our time repeatedly trying to help users troubleshoot a forgotten |
Thanks for the reply,
For some reasons, conda was not picking the latest stumpy during installation. I guess it was because of compatibility with existing packages in my environment. So, I created a fresh environment and installed the latest version (1.12.0). I can see some differences, I have warnings...from time to time: Running But running However, subsequent executions of the same code do not give the warning anymore. I have to restart my Jupiter notebook for the warning to show up. Don't you think
def stump(tA, m, tB):
if arrays_are_equal(tA, tB):
mp = np.empty((len(tA), 4))
mp[:, 0] = 0
mp[:, 1] = np.arange(len(tA))
mp[:, 2:] = -1
return mp
return _stump(tA, m, tB) This way, the warning becomes optional, and there is no need to overwrite the trivial matching parameter. Again, I am new to this library, so I don't know all the constraints to take into account. Let me know if I am missing something here. Your explanation on the exclusion is clear, I am happy that there is a way to change it.
mp = stumpy.stump(T, m, excl_zone=8) # This no longer works
motif_distances, motif_indices = stumpy.motifs(T, mp[:, 0]) # Notice that no exclusion zone is specified so a default is used I might be missing something here, stumpy.stump() already returns the distances and the indices. So why does stumpy.motifs() still need the exclusion zone? Isn't the following code doing the same: mp = stumpy.stump(T, m, excl_zone=8)
motif_indices= np.argsort(mp[:, 0])[:k] # top k motifs
motif_distances = mp[motif_idx, 0] |
Yes, unlike Python logging, Python warnings are only set up to warn you the first time something happens rather than repeatedly spamming you. It's a blessing and a curse but warnings can be easily caught/ignored programmatically while logging is difficult to "catch". When you restart the Jupyter notebook this essentially clears the cache. You'll notice that this is similar to or consistent with how some warnings are handled in
Maybe?! I can see it both ways. Purely from an API standpoint (and ignoring the fact that the time series are the same), the first is a self-join and the latter is an AB-join, which should have different expectations. While I understand your rationale, IMHO, I think 99.9% of users will have no reason to do either of these things. I don't mean to come across as argumentative/stubborn but given that this is a super edge case, I don't think there is perfect solution. I'll try to spend some time to think about it.
I will have to think about this more. It might be doable |
Hi,
I am trying stumpy, I really love the library by the way.
I have the impression that the parameter
ignore_trivial
does not work as expected. Here is a minimum example.I expect this code not to exclude trivial matches, and therefore the mp should be 0 everywhere as this is a self-join. However, the result is the following:
Even more suprising, the result is the same if I set ignore trivial to False.
Clearly, something is not working as expected.
I am using Stumpy 1.9.2
Thanks
The text was updated successfully, but these errors were encountered: