-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Computing matrix profile while considering a custom transformation for subsequences #942
Comments
The two examples above can be covered by the general case below: If our distance function is Euclidean Distance (i.e. Let's say we have two subsequences And, let's say we want to just apply some offset to them before computing their distance,
where
Note that the firs term is just Let Lines 121 to 140 in 3559b38
to this:
|
Considering that we are only adding either a (negative) constant or some pre-determined set of values for each subsequence, I think we should consider new parameters called |
Definitely right! 😄 |
We will also need to consider what this means for the multi-dimensional case and it's affect on the MDL (minimum description length) |
I tried to go through the source code. I realized that, in non-normalized MDL, we obtain the global I think that we might be able to use the same approach after applying But, how to test this proposal? (2) We insert two similar patterns but with the same average, say 0, to a randomly-generated time series data, and then we try to see if we can detect It using the existing code. Then, if it works, we can move up / down one of the motifs, and try to capture those again using One thing that is certain is that MDL is complicated 😄 |
Where exactly are you seeing this? I'm looking at |
In Lines 275 to 281 in 3559b38
And the function Lines 165 to 195 in 3559b38
And then, in Lines 293 to 296 in 3559b38
|
I have no idea. I can't remember why we used Min Max Scaling for the discretization :(
I'm not sure where this statement came from 😆 |
I will do some research to see if I can find something.
And, this is from you who is very careful. And, now I can better see why you try to be very careful in things that need to be maintained later. Even with that level of due diligence, there are things that may go slightly wrong (By "wrong", I do not mean incorrect. I mean it may lose its clarity). |
I remember now! In one of the other MDL comments, I had asked the question:
Michael Yeh actually responded:
Then, I later discovered:
So, at least it is based on published work. I still don't feel comfortable discretizing a subsequence that is transformed by |
Currently, STUMPY supports the parameter normalize which allows users to compute matrix profile for the following cases:
(1)
normalize == False
: Compute the distance between subsequences with no transformation(2)
normalize == True
: z-normalize subsequences before computing the (Euclidean) distanceThere have been a few interest in using a different transformation:
Usually, when the volume of data is small, it is better to just get all the subsequences, do the transformation, and then compute the full distance matrix (see discussion in #900). But, what if the volume of the data is large? In such case, having an efficient approach to compute the distance considering the custom transformation can be useful.
The text was updated successfully, but these errors were encountered: