Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make excl_zone a parameter of stumpy.motif() #1050

Open
JulienLeprince opened this issue Dec 12, 2024 · 7 comments
Open

Make excl_zone a parameter of stumpy.motif() #1050

JulienLeprince opened this issue Dec 12, 2024 · 7 comments
Labels
question Further information is requested

Comments

@JulienLeprince
Copy link

Hi,

stumpy.motif() identifies a lot of overlapping motifs in my time series - would it possible to add excl_zone as a an input parameter to the function such that the unwanted behavior can be adjusted by users?

Kind regards,
Julien

@TDAmeritrade TDAmeritrade deleted a comment Dec 12, 2024
@seanlaw
Copy link
Contributor

seanlaw commented Dec 12, 2024

@JulienLeprince Thank you for your question and welcome to the STUMPY community.

would it possible to add excl_zone as a an input parameter to the function such that the unwanted behavior can be adjusted by users

IIRC @NimaSarajpoor looked at this a long time ago (I don't recall the detail and I can't seem to find the original conversation) but I vaguely recall that excluding regions next to an already-discovered motif (candidate) subsequence would lead to unintended consequences. @NimaSarajpoor do you remember anything about this or was it related to something else?

Note that an exclusion zone is already applied to neighboring areas surrounding subsequences that match a candidate motif subsequence.

@JulienLeprince
Copy link
Author

Thanks for the welcome and prompt reply Sean. To be clear:

  • the exclusion zone works well for spotting an isolated motif A, i.e. no overlaps with identified motifs within that same group.
  • however, following identified motif groups, say B and C, are typically shifted versions of the initial A motif. Increasing the sequence length eventually removes the unwanted behavior but at the cost of not identifying other desirable motifs.

@seanlaw
Copy link
Contributor

seanlaw commented Dec 12, 2024

however, following identified motif groups, say B and C, are typically shifted versions of the initial A motif. Increasing the sequence length eventually removes the unwanted behavior but at the cost of not identifying other desirable motifs.

Yes, I understand. However, since the exclusion zone is also applied, say, upstream (and downstream), of a motif, it is also possible to miss desirable motifs as well. In other words, it's a double edged sword and we'd rather err on the side of NOT ignoring potential motifs since we (the developer) can't tell whether a subsequence is "important" or not. It also certainly depends on the size of your window, m, and the length of your time series.

@JulienLeprince Often times, I like to ask whether you could do a bit of post-processing by setting max_motifs to some large number and then go through each motif and see if it is "too close" to one that was already previously found? Frankly, I think this is the "safer" thing to do. As I think about it a bit more, I think you might also be able to set max_motifs=10 and, say, the first motif is "correct" but the other 9/10 are "too close" to the first motif, then you can take your P and doctor it by setting those distances (for the 9 latter subsequence locations) to np.inf and then you could run stumpy.motifs again. Purely just thinking out loud here (without having consumed any coffee today)!

I truly don't mean to be difficult here but stumpy.motifs was meant to be reasonable starting point for analyzing your matrix profile (a simple helper function) and it was never meant to satisfy more specific conditions. Anything beyond its super basic functionality probably means that you should/could start rolling your own motif_finder function by copying the stumpy.motifs function. I would encourage that and possibly share your code in our Discussions section.

@NimaSarajpoor
Copy link
Collaborator

NimaSarajpoor commented Dec 13, 2024

@seanlaw

do you remember anything about this or was it related to something else?

I think we had a relevant discussion but couldn't find it. I tried to check the code again, and noticed a few things that I shared below. Please let me know if you notice any mistake

@JulienLeprince
First, let's have a quick review of stumpy.motifs. Let's say we are looking for motifs of length m in the time series T. We can use the function stumpy.motifs, and get the output motif_indices:

# motif_indices
array([
[A0, A1, A2, A3, A4],
[B0, B1, B2, B3, B4],
...
])

A0 is the index of first motif, and the index for its closest matches are A1, A2, A3, and A4.
B0 is the index of second motif, and its closest matches are B1, B2, B3, and B4.

Note 1:
excl_zone is considered between the two motifs A0 and B0 , i.e. $|A0 - B0| > excl\textunderscore{zone} $

Note 2:
excl_zone is also considered between the matches of a motif. In other words:

$\forall{x, y}, {x}\neq{y} \in set(A0, A1, A2, A3, A4): |x - y| > excl\textunderscore{zone} $
AND
$\forall{x, y}, {x}\neq{y} \in set(B0, B1, B2, B3, B4): |x - y| > excl\textunderscore{zone} $

Note 3:
The matches of motif B0 can even be the same as the matches of motif A0. In other words, the following logic is NOT implemented:

$\forall{x} \in set(A1, A2, A3, A4), \forall{y} \in set(B1, B2, B3, B4): |x - y| > excl\textunderscore{zone} $


  • If your challenge is related to Note 1 or Note 2 above, it can be fixed by changing the excl_zone in stumpy.config.
  • If your challenge is related to Note 3, then I think you can try to modify the code. However, IMO, this can/may result in missing (not capturing) some motifs. We can discuss it further if needed.

@seanlaw
Copy link
Contributor

seanlaw commented Dec 13, 2024

Note 1:
excl_zone is considered between the two motifs A0 and B0 , i.e. $|A0 - B0| > excl\textunderscore{zone} $

@NimaSarajpoor Please correct me if I'm wrong but I don't think this is incorrect. From what I can see in the stumpy.motifs code, excl_zone appears to only be applied to the matches such that each match (i.e., [A1, A2, A3, A4]) is prevented from becoming a motif. However, I don't think there is anything preventing A0 + 1 (i.e., one index location next to A0) from becoming the next candidate motif (i.e., B0 can be A0 + 1 or even A0 + m / 4 - 1). Maybe I'm overlooking this in the code?

@NimaSarajpoor
Copy link
Collaborator

NimaSarajpoor commented Dec 13, 2024

From what I can see in the stumpy.motifs code, excl_zone appears to only be applied to the matches such that each match (i.e., [A1, A2, A3, A4]) is prevented from becoming a motif. However, I don't think there is anything preventing A0 + 1 (i.e., one index location next to A0)

stumpy/stumpy/motifs.py

Lines 140 to 143 in 3165d1c

for idx in query_matches[:, 1]:
core.apply_exclusion_zone(P, int(idx), excl_zone, np.inf)
candidate_idx = np.argmin(P[-1])

A0 (the start index of the motif itself) is included in the query_matches[:, 1], and hence its close neighbours are excluded before it chooses the next best motif candidate.

A quick check

import numpy as np
import stumpy

seed = 0
np.random.seed(seed)

T = np.random.rand(20)
T[:3] = 0.0
T[-3:] = 0.0
m = 3

mp = stumpy.stump(T, m=m)
query_matches = stumpy.match(T[:3], T)

print(query_matches[:, 1]) 
[0 17]  # This contains the index `0`

As a side, thanks for mentioning that matches (and their "close" neighbours) are prevented from becoming the next motif. In my previous comment, Note 1 did not reflect that. Going to provide its revised version below:

Note 1 (revised)
$\forall{x} \in set(A0, A1, A2, A3, A4): |B0 - x| > excl\textunderscore{zone} $

@seanlaw
Copy link
Contributor

seanlaw commented Dec 13, 2024

A0 (the start index of the motif itself) is included in the query_matches[:, 1], and hence its close neighbours are excluded before it chooses the next best motif candidate.

Ohhhhhhhh. Very sneaky. I totally forgot about that!! Nice catch. I almost feel like we SHOULD add a comment to remind future-self.

In light of this, @JulienLeprince are you able to provide a concrete example (with data) of where the "other" motifs are "inside" of the exclusion zone of a previously identified motif? Note that the exclusion zone is within +/- m / 4 of some index and m is your window size.

@seanlaw seanlaw added the question Further information is requested label Dec 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants