-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Filestream store GC, entries are now removed when they TTL expires #38488
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
This pull request is now in conflicts. Could you fix it? 🙏
|
0786577
to
11fa75e
Compare
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
💚 Build Succeeded
History
cc @belimawr |
💚 Build Succeeded
History
cc @belimawr |
💔 Build Failed
Failed CI StepsHistory
cc @belimawr |
💚 Build Succeeded
History
cc @belimawr |
💚 Build Succeeded
History
cc @belimawr |
💚 Build Succeeded
History
cc @belimawr |
💚 Build Succeeded
History
cc @belimawr |
💚 Build Succeeded
History
cc @belimawr |
💚 Build Succeeded
History
cc @belimawr |
💚 Build Succeeded
History
cc @belimawr |
filebeat.Stop() | ||
|
||
registryLogFile := filepath.Join(tempDir, "data/registry/filebeat/log.json") | ||
readFilestreamRegistryLog(t, registryLogFile, "remove", 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I thought we never used remove
but just updated the TTL
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me too and that was the bug! 🤯
The resources were never correctly released, so the gcClean
would never for any entry.
They way Filestream registry works is:
- Whenever an entry is "removed", its TTL is set to zero.
- Periodically the
gcStore
runs, any resource that is finished is then removed from the in-memory store and have aremove
operation added to the log file.
When Find
is called, it locks the resource by calling resource.Retain
beats/filebeat/input/filestream/internal/input-logfile/store.go
Lines 428 to 431 in 9fc2160
if resource := s.table[key]; resource != nil && !resource.isDeleted() { | |
resource.Retain() | |
return resource | |
} |
Release
func (r *resource) Release() { r.pending.Dec() } |
There were a few places never calling Release
🤦♂️, hence Finished
func (r *resource) Finished() bool { return r.pending.Load() == 0 } |
op: remove
to the registry log file. 🤯 🤯 🤯 🤯 🤯 🤯 🤯 🤯 🤯
026195a
to
96a28f2
Compare
Changelog updated, rebased onto |
a704351
to
510e462
Compare
510e462
to
31b2b12
Compare
Filestream keeps a `resource` for each file it is harvesting, this resource gets written to the registry, both in-memory and on-disk. Even when `clean_removed` was enabled the resource would not be fully released and never removed from memory. While that did not affect offset tracking, it would not release memory when it should.
31b2b12
to
73b6eaa
Compare
Proposed commit message
The
resources
from Filestream registry have a counter to indicate how many 'owners' have got a hold of that resource, this counter was not correctly decremented. Because it never reached zero, no entry was ever removed from the in-memory store and even though the store GC would run periodically, no resource could be removed. That caused the in-memory store to be ever growing and theop: remove
never to be seen in the registry log file.This commit fixes this bug by correctly calling
Released
in every resource that is retained.Checklist
[ ] I have made corresponding changes to the documentation[ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.## Author's ChecklistHow to test this PR locally
1. Create the following configuration file
2. Build and start Filebeat
3. Create a log file and add some data
4. Wait a few seconds until the file is harvested
You can check the output file or look the logs for the logs stating the harvester has been closed
5. Delete the file
6. Wait for the store cleanup logs
7. Assert that the entry has also been removed from the registy
You should be able to find the following lines in the registry:
set
operation is settingttl:0
. That is how the entry is "removed" from the registryremove
operation removes any reference to that state and also removes all in-memory references to theresource
pointing to this file.Related issues
clean_removed
does not work as expected #36761## Use cases## Screenshots## Logs