-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding performance #131
Comments
This PR #132 , makes the changes described above. As a result the new measurement for the build while Huge performance gain. |
Given the problems we encountered at #136, we can now confirm that the hash check isn't in vain. The performance overhead isn't to be ignored though so I think the solution provided by #136 is a much preferred one and if we were to anyway keep the hash instead, we should find ways to optimize the way we fetch hashes. Potentially by skipping work already done, checking a file's modification dates etc... |
So after properly profiling the code using valgrind. Even after the "optimization" mentioned above the three critical points in term of performance are:
The bigger the project, The bigger and more dominant the complexity of Now for 1 we can't do much. They are library calls so all we can do is decrease the number of calls we make to
This way we can keep a simple array used for For |
No matter the performance, the first goal would be to produce correct results. Regarding the optimizations, my first thought would be that, once a process has ended, its data can probably be removed, as they will not be needed any more. Even if we stick with linear arrays ( However, such optimizations are not the current goal. |
@zvr Well an update, after some extra tests. It's the file reading that cripples performance, not the hash itself. So that's the best we can get! Unless we can improve the way we read from disk somehow(hard to imagine). |
@zvr What about two options?
These give us respectively the ability to:
All of this is food for thought by the way. |
@zvr After some profiling, turns out a massive performance hit occurs due to our "file entries re-usage".
While compiling the xbps package manager, i made the following measurements:
(All in CPU time)
without build recorder wrapping it: 6.934s
with build recorder wrapping it: 14.988s
That's an overhead of 8.064s, of which 5.180s is due to
get_file_hash
function call atsrc/tracer.c:255
.This raises the question, do we really need to search using the hash as well? Absolute paths are unique to begin with and we are already tracking
rename(2)
calls. This paired with the fact that thefinfo
array is sorted in ascending order with respect to time, is a guarantee that the first match on a backward search is the file we are looking for.Is my logic flawed? Is the hash check there important?
The text was updated successfully, but these errors were encountered: