[8.x](backport #41952) auditbeat: Add a cached file hasher for auditbeat #41992
+373
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
This implements a LRU cache on top of the FileHasher from hasher.go, it will be used in the new backend for the system process module on linux.
The cache is indexed by file path and stores the metadata (what we get from stat(2)/statx(2)) along with the hashes of each file.
When we want to hash a file: we stat() the file, then do cache lookup and compare against the stored metadata, if it differs, we rehash, if not we use the cached values.
The cache ignores access time (atime), it's only interested in write modifications, if the machine doesn't support statx(2) it falls back to stat(2) but uses the same Unix.Statx_t.
With this we end up with a stat() + lookup on the hotpath, and a stat() + stat() + insert on the cold path.
The motivation for this is that the new backend ends up fetching "all processes", which in turn causes it to try to hash at every event, the current/old hasher just can't cope with it:
With the cache things improve considerably, we stay below 5us (200k/s) in all cases:
Checklist
- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.This is an automatic backport of pull request auditbeat: Add a cached file hasher for auditbeat #41952 done by Mergify.