-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: file trimming improvement #7852
storage: file trimming improvement #7852
Conversation
Signed-off-by: Leonardo Alminana <[email protected]>
Signed-off-by: Leonardo Alminana <[email protected]>
Signed-off-by: Leonardo Alminana <[email protected]>
Signed-off-by: Leonardo Alminana <[email protected]>
Signed-off-by: Leonardo Alminana <[email protected]>
" %s/%s", ch->st->name, ch->name); | ||
/* File trimming has been made opt-in because it causes | ||
* performance degradation and excessive fragmentation | ||
* in XFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically here, it's because the behavior of the cio code was telling the file system very conflicting things. It was saying that a file was going to be X MB size, and then saying it was going to be Y bytes (log message length) in size... which leads the file system to make allocation decisions based on those bits of information. Decisions like "oh, this file isn't going to grow, I can put this other thing here", which is an assumption that's invalidated as soon as the log file grows. Given an appropriate loop over an appropriate amount of time, the fragmentation starts to hurt.
I'm not sure what file system would be have well with the previous behavior of "allocate to X MB, truncate to Y bytes, keep appending with an explicit truncate until you get to X MB", but I doubt it's behaving well on purpose. i.e. I don't know why you would keep around this behavior even as an option.
I'm not super familiar with fluent-bit and cio, but if this code is now ending up going "oh, the config says this file will be 2MB, so just allocate 2MB and don't truncate it down unless we are 100% sure it will not grow", then I think that will give the file system allocator the best opportunity to make good decisions.
If it is doing that, and you reserve enough space at the end of a log file to be able to write a log message of "oh no, out of disk space", then you probably have a nice "graceful" way to deal with ENOSPC conditions. (well, copy-on-write and thin provisioning is a whole other thing, but from what I understand here, that will be pretty unlikely for any individual file).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fluent-bit doesn't pre-allocate the expected file size yet (that's one opt-in improvement we're considering for the current version). Instead, when cio detects that the file is not large enough to fit the contents it wants to append it grows the file in 8 * page_size
(wihch in most cases means 32kb) increments until it reaches the required size.
With the current chunk file size limit implementation in fluent-bit pre-allocating files doesn't guarantee that we won't have to increase their size because the limit is imposed after the contents are appended but that could be easily fixed.
Thanks for taking a look at this, if you have any questions or remarks please let me know, I'd be glad to go over it with you to be sure that there are no corner cases that weren't addressed,
@leonardo-albertovich, The issue is no longer reproducible with this Test Branch. |
Hello @leonardo-albertovich When testing this fix with option storage.checksum enabled, we see several chunks with "format check failed" error.
FLuent Bit ran for 5 minutes and got 862 of these errors.
Could you please advise if this is expected with this option enabled? Thanks. |
@RicardoAAD this should not happen and I have not observed this previously. Please share a copy of those corrupted files with me so I can take a look at them. Meanwhile I'll try to reproduce the issue. |
@RicardoAAD I have been running fluent-bit for 10 minutes and didn't see that error once so I think I'll need some input from your side. |
Thanks @leonardo-albertovich, Please let us know if you need any additional information from the repro that we showed you today. Regards, |
This bug had already been fixed in upstream, the round operation causes the mapping size (and thus alloc size) to be larger than the file size which means there is a memory area that's not backed by the file which in some cases such as XFS causes data loss. Signed-off-by: Leonardo Alminana <[email protected]>
@RicardoAAD could you please re-test? |
Hi @leonardo-albertovich, I tested the new changes, and there are no issues with the checksum now. Thanks. |
This PR adds a new option named
storage.trim_files
which can be used to control the file trimming behavior in chunkio.This is necessary in order to address an issue where excessive file fragmentation was caused by over zealously trimming chunk files in certain XFS deployments.
The default behavior for
storage.trim_files
isoff
which is a deviation from the previous default behavior which is something we might want to be mindful of.