-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tutorial notebook DAMP #920
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@seanlaw |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #920 +/- ##
=======================================
Coverage 98.93% 98.93%
=======================================
Files 84 84
Lines 14292 14292
=======================================
Hits 14140 14140
Misses 152 152 ☔ View full report in Codecov by Sentry. |
@@ -0,0 +1,378 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #24. For the given index i, segmentize the array np.arange(i) into
Is there a better way to describe the task of this function? How about:
"Given the index i
, divide the array np.arange(i)
into chunks. Ensure the last chunk's size is chunksize_init
, with the preceding chunks doubling in size. The output consists of (start, stop) indices of each chunk."
Also, we may add:
The left-most chunk always start at 0.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems weird. It's not clear why you need to end with the last element being chunksize_init
Something feels backwards here. The intent is hard to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we actually need to call the function multiple times? Or is it possible to call the function once and then shift the indices somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something feels backwards here
Well, it is backward (I mean... the process itself is backward). but, I trust your lens! Please allow me to think more and see if I can come up with something that is more elegant.
Also, do we actually need to call the function multiple times? Or is it possible to call the function once and then shift the indices somehow?
I wanted to do that but I decided to start simple. What we can do is to keep shifting the start, stop indices of all chunks by one in each iteration. So, for instance, if I get the chunks for the subsequences in T[:idx]
. Then, I can find the chunks for T[:(idx+1)]
by just shifting start/stop indices of chunks by one to the right. We just need to keep track of the left-most chunk though as it can become the power two of a number at some point. In such case, the number of chunks will be increased by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[FYR...before I forget]
Another approach is to ALWAYS OFFSET FROM THE QUERY INDEX idx
by s
, 2s
, 4s
, 8s
, and so on, where s
is power of two. The good thing is that the difference between any two consecutive offset is STILL a power of two. This may result in cleaner code. IIRC, I think I saw something similar in the MATLAB code before. Going to check that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MATLAB code uses the following lines for updating the start / stop of each chunk:
# Query is at index `i`, i.e. `query = T(i:i+SubsequenceLength-1)`
# X is the chunk size, initially set to `2 ^ nextpow2(8 * SubsequenceLength)`
X_start = i - X + 1 + (expansion_num * SubsequenceLength)
X_end = i - (X / 2) + (expansion_num * SubsequenceLength)
# and then
approximate_distance = min( real(MASS_V2(T(X_start:X_end), query)));
X = X * 2
expansion_num = expansion_num + 1
The term (expansion_num * SubsequenceLength)
is to take into account the elements of the last subsequence in the new chunk. To keep the length of chunk untouched, the X_start
has the same shift.
Note 1:
According to the paper / MATLAB code, the length of chunk, T
in core.mass(Q, T)
, (NOT the number of subsequences) should be a power-of-two.
Note 2:
The reason behind considering the length power-of-two
for chunk is to speed up the MASS algorithm (according to the authors)
So, I did a quick check to see how much having the length power-of-two affects the performance.
seed = 0
np.random.seed(seed)
T = np.random.rand(10_000_000)
m = 50
T, M_T, Σ_T, T_subseq_isconstant = stumpy.core.preprocess(T, m)
query_idx = 600000
t_lst = []
for stop in range(2 ** 20 - 1000 - 1, 2 ** 20 + 1000 + 1):
time_start = time.time()
QT = core.sliding_dot_product(
T[query_idx : query_idx + m],
T[start : stop],
)
D = core._mass(
T[query_idx : query_idx + m],
T[start : stop],
QT=QT,
μ_Q=M_T[query_idx],
σ_Q=Σ_T[query_idx],
M_T=M_T[start : stop - m + 1],
Σ_T=Σ_T[start : stop - m + 1],
Q_subseq_isconstant=T_subseq_isconstant[query_idx],
T_subseq_isconstant=T_subseq_isconstant[start : stop - m + 1],
)
time_end = time.time()
t_lst.append(time_end - time_start)
And, the I plot it:
indices = np.arange(2 ** 20 - 1000 - 1, 2 ** 20 + 1000 + 1)
indices = indices[2:]
t_lst = t_lst[2:]
idx = np.flatnonzero(indices == 2 ** 20)[0]
plt.figure(figsize=(20, 5))
plt.scatter(indices[idx], t_lst[idx], color='r', label='chunksize = 2 ** 20')
plt.plot(indices[idx-200 : idx+200], t_lst[idx-200:idx+200])
plt.ylabel('running time')
plt.legend()
plt.show()
Well, it seems the running time for the chunk size 2 ** 20
is low (not necessarily the lowest) but it should be okay. To remain faithful to the paper, I am going to consider the length power-of-two for each chunk size.
@seanlaw |
@@ -0,0 +1,569 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #28. while start < chunk_stop:
Is this code readable? I tried to vectorize it but couldn't figure out "how". If the number of subsequences in each chunk were supposed to be a power of two, I think I would be able to vectorize it. However, according to the paper, the size of the chunk itself should be a power of two. In other words, the number of subsequences is like... 2 ** num - m + 1
Since I couldn't find out a cleaner solution, I am going with naive_get_range_damp
for now.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to see more examples of what the inputs/outputs are suppose to be in order to understand what is expected. Right now, I'm very confused. Can you give me some concrete examples and any gotchas (i.e., when things aren't perfectly square)?
@@ -0,0 +1,569 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #67. PL[start : stop - m + 1] = np.minimum(PL[start : stop - m + 1], D)
Previously, we had the variable pruned
, which was a boolean vector (of length `len(T)-m+1`), where pruned[i]
is True if the i-th subsequence is pruned. And, instead of line above (i.e. line #67), we had:
mask = np.argwhere(D < bsf) + startpruned[mask] = True
Recall that the paper's original algorithm does not compute PL[i]
if pruned[i]
is True. It just skips it. In such case, the original algorithm set PL[i]
as follows:
PL[i] = PL[i-1]
which makes it difficult to find the correct index of discord. The MATALB code does a hack instead as follows:
PL[i] = PL[i-1] - 0.000001
and this does not seem to be a clean solution.
So, instead, I am updating the (approximate) matrix profile PL
by using the computed distance profile D . This helps us to avoid the hack, and I think It should not be computationally more expensive compared to np.argwhere(D < bfs)
Reply via ReviewNB
@@ -0,0 +1,569 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 60-68 updates the chunks_range
by shifting it by one to the right in each iteration. I feel the code is not that clean! Still trying to figure out if there is a better way for updating the chunks range. Any thoughts?
Reply via ReviewNB
Would you please check out the comments and let me know what you think? In particular, I need your support to figure out: Note [Regarding exclusion zone]: |
@NimaSarajpoor Let's start with these comments first before we dive into DAMP. |
A few updates: (1) The proposed (2) The running time of DAMP algorithm increases drastically when
(3) An experimental top-k DAMP code is provided on the supporting webpage. I tried to make some changes on our current DAMP to reflect top-k code. I did a few tests and I think it works (i.e. the results are the same as the top-k discords from naive DAMP). Btw, I did not push these changes to this PR. How it works: (4) Is only ONE discord interesting? |
I think it is worth profiling the code to see which function most of the time is being spent. Does the paper offer any clues on what timings should be? How does the Matlab code perform? |
I profiled the computing time for:
I ran DAMP version 2.0 provided on the supporting website using the same input of Example 3 (of our notebook in this PR). The output matches (see below) and the running time is ~56 sec (which is close to our current implementation of DAMP)
Note:
Working on it... |
I didn't realize that STUMPY was that performant even for Update: According to the site for the paper, the first example shown says:
For a CPU that was released in Q2 of 2018 has only 6 cores/threads, 22.3 seconds seems really fast for 8 million data points. I'd be curious if we could reproduce these results using their Matlab code or our DAMP code.
Thank you. At least this confirms that we're not doing anything horribly wrong compared with the MATLAB code? But is this really scalable? After you've investigated, it might be good to check with Eamonn and his team.
Presumably, this shouldn't change the performance much? |
Indeed exciting!
According to this video, the
I will get to this after the investigation. Before I forget, I am going to share a couple of observations. We may discuss/ignore them later after trying out the case you suggested above. (1) Regarding the trillion-experiment data: The data is a txt file, containing values, each in one line. The value returned by The data is about ~3GB. The maximum size that can be uploaded to MATLAB online server is (2) I think the performance can be affected by the location of discord. So, in addition to the running time, it is worth it to report the location of discord as well. |
Excellent. Please let me know if there is anything I can do to help support you! |
I couldn't find the time series with 8M data points. The Case 1-1:
Case 1-2:
Note 1: Indexing in MATLAB starts at 1.
So, if I have a data that has some repetitive patterns every, say, 2 * m period, then approx. 6 * m subsequences are being processed in the first chunk (almost) unnecessarily(?). However, it is possible to just early abandon the neighbours of a subsequences after considering just the first Is the running time of If yes... then we may even avoid doubling the chunksize in each iteration of backward process, and just go backward using the same chunksize! Next step [Update] Now, the computing time of MATLAB in Case 1-1 becomes 152 seconds. [Update 2] I think it is important to test the performance when
|
@NimaSarajpoor Thanks for running the comparison. From what I can tell, the Python implementation seems to be around the same performance as Matlab (maybe a little faster)? What should be the next steps? Do we have questions that we should pose to Eamonn (especially with regards to reproducibility)? |
@seanlaw Line 426 in 21840d7
However, I think: So, for now, I just added
Correct. I just added
(1) I think we can start reproducing a couple of figures of the paper.
I can think of three questions:
Regarding the last item (i.e. performance of DAMP when window size is large) Then:
|
Cool! I think reproducing the figures/performance found in the paper is key.
We should definitely get some clarification for this.
Note that Eamonn likely has no idea what
Based on your test of this, DAMP seems pretty bad as window size increases. Can you confirm that this is also true when using the MATLAB code? If "yes", then we should definitely ask Eamonn and his students. |
For sliding dot product between a query, Q, and a time series T, I tried MATLAB (fft) and Python (fft / njit). The computing time for these three cases are shown below. Notice that Python with njit seems to be dependent on the window size. However, that is not the case for Python fft (with SciPy). However, to decorate the backward / forward functions with njit decorator (to speed up computation), the sliding dot product cannot take advantage of scipy fft as Numba does not support that!
Now that we have checked the performance of sliding dot product, I will check the MATLAB DAMP performance considering different window size as you suggested. Will provide an update. @seanlaw |
Yikes, our Can you please share the code that you used? I think it might be worthwhile posting any comparisons to both the Update It appears that Note, at some point, we may want to implement our own parallel |
Yes... Below, I am showing what I meant by each case :
Btw, I think it makes sense that the computing time in the last case depends on the window size
All the following three cases were run on the MATLAB online server. Case: [MATLAB]
Case: [Python]
Case: [Python]
Right. I read the doc. It has a parameter called
Will try it out. Note that this will affect the case that uses
Interesting! I will see if I can easily/safely replace scipy convolve with numpy convolve, and then compare again.
I understand. In fact, I already tried rocket-fft, which supports Numba for
@seanlaw |
I was curious as to why Matlab's FFT might be faster and came across this post. |
Also: https://www.reddit.com/r/Julia/comments/5wel0j/fft_speed_vs_matlab/ |
In 2017, I experimented with FFTW and PyFFTW but hadn't noticed any significant performance difference while noting that they were much harder to use (due to its need to make "plans"). I understand that FFTW uses a ton of tricks and optimizations underneath the hood that could help make DAMP faster but, after a quick search, it looks like FFTW's license (GPL) is incompatible with STUMPY's (BSD3) license. At some point, we may just need to accept our performance-dependency trade off. Maybe there's a nice way to detect whether |
I understand. The best option is to see if we can improve the performance with the current dependencies. If not, then...
It makes sense. I created the issue #938 to continue our conversation regarding the performance of sliding dot product. |
We recently noticed that Python DAMP's running time increases as window size, (1) Comparing the "sliding dot product" functions (2) Comparing Python Naive (i.e. with STUMP using multiple threads) vs Python DAMP (using one thread but nit-decorated with fast math. We used
Recall that the current version of MATLAB DAMP provided on the supporting webpage has some differences compared to the original algo proposed in the paper. To this end, I am going to show the result for both versions. To reproduce the following result, here is the code:
(1) MATLAB DAMP Original The following figure shows the running time of DAMP original for different window sizes. Group of adjacent datapoints with the same color has the same
(2) MATLAB DAMP v2
Now, let's get back to our question.
According to my experiments, I think it is true. Let's compare MATLAB DAMP with Python DAMP (njit version), for different window sizes, by running the latter on the MATLAB online services. Now, let's compare again but with larger window size, We can now see the impact of higher time complexity of sliding dot product in Python. Finally, let's just focus on MATLAB DAMP and consider a wide range of values Group of adjacent data points with similar color are associated with window sizes that have the same As an aside, I should mention that the MATLAB code has the following block that gives warning and suggests a window size when the provided value by user is either too small or too large
What does the above results mean for us? (1) The performance depends on size of (2) The sliding dot product without fft trick has the time complexity |
Thanks @NimaSarajpoor |
Thanks for the good work @NimaSarajpoor Do you plan to add anything else or the DAMP is ready to roll? One comment:
From slides here: https://docs.google.com/presentation/d/1tQQfKuKrOa3j5-WJ9peoYfZGOnJbBcl9/edit#slide=id.p31
|
Not ready yet. We are currently focusing on implementing numba-friendly Fast Fourier Transform (FFT), which is an important component of DAMP (see #938 and #939 to track the progress for FFT).
Ahhh.... good catch! Thanks! 😄 |
This PR is a replacement for PR #872.
This PR is to address #606.
I will start with implementing the original algorithm, as proposed in the paper, as close as possible. Then, I will add notes on how to optimize the code. I will consider the MATLAB code too.
Notes
In contrast to the original paper, the MATLAB code currently shows the following logic to compute the initial chunksize for backward processing:
2 ^ next_pow2(8 * m)
, pay attention to8
.As a follow up to previous point: what happens if
m
is already a "power of two" number?