Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: file trimming improvement #7852

Merged
merged 6 commits into from
Sep 12, 2023

Conversation

leonardo-albertovich
Copy link
Collaborator

This PR adds a new option named storage.trim_files which can be used to control the file trimming behavior in chunkio.

This is necessary in order to address an issue where excessive file fragmentation was caused by over zealously trimming chunk files in certain XFS deployments.

The default behavior for storage.trim_files is off which is a deviation from the previous default behavior which is something we might want to be mindful of.

" %s/%s", ch->st->name, ch->name);
/* File trimming has been made opt-in because it causes
* performance degradation and excessive fragmentation
* in XFS.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically here, it's because the behavior of the cio code was telling the file system very conflicting things. It was saying that a file was going to be X MB size, and then saying it was going to be Y bytes (log message length) in size... which leads the file system to make allocation decisions based on those bits of information. Decisions like "oh, this file isn't going to grow, I can put this other thing here", which is an assumption that's invalidated as soon as the log file grows. Given an appropriate loop over an appropriate amount of time, the fragmentation starts to hurt.

I'm not sure what file system would be have well with the previous behavior of "allocate to X MB, truncate to Y bytes, keep appending with an explicit truncate until you get to X MB", but I doubt it's behaving well on purpose. i.e. I don't know why you would keep around this behavior even as an option.

I'm not super familiar with fluent-bit and cio, but if this code is now ending up going "oh, the config says this file will be 2MB, so just allocate 2MB and don't truncate it down unless we are 100% sure it will not grow", then I think that will give the file system allocator the best opportunity to make good decisions.

If it is doing that, and you reserve enough space at the end of a log file to be able to write a log message of "oh no, out of disk space", then you probably have a nice "graceful" way to deal with ENOSPC conditions. (well, copy-on-write and thin provisioning is a whole other thing, but from what I understand here, that will be pretty unlikely for any individual file).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fluent-bit doesn't pre-allocate the expected file size yet (that's one opt-in improvement we're considering for the current version). Instead, when cio detects that the file is not large enough to fit the contents it wants to append it grows the file in 8 * page_size (wihch in most cases means 32kb) increments until it reaches the required size.

With the current chunk file size limit implementation in fluent-bit pre-allocating files doesn't guarantee that we won't have to increase their size because the limit is imposed after the contents are appended but that could be easily fixed.

Thanks for taking a look at this, if you have any questions or remarks please let me know, I'd be glad to go over it with you to be sure that there are no corner cases that weren't addressed,

@RicardoAAD
Copy link
Collaborator

@leonardo-albertovich, The issue is no longer reproducible with this Test Branch.

@RicardoAAD
Copy link
Collaborator

Hello @leonardo-albertovich

When testing this fix with option storage.checksum enabled, we see several chunks with "format check failed" error.

[2023/09/01 21:29:15] [error] [storage] format check failed: tail.0/10413-1693603751.284017178.flb
[2023/09/01 21:29:15] [error] [storage] format check failed: tail.0/10413-1693603751.364724849.flb
[2023/09/01 21:29:15] [error] [storage] format check failed: tail.0/10413-1693603751.481694472.flb

FLuent Bit ran for 5 minutes and got 862 of these errors.

$ grep "format check failed"  test-log.log | wc -l
862
[SERVICE]
    grace                      0
    flush                      1
    log_level                  info
    log_file                   ./test-log.log
    http_server                off
    storage.path               /mnt/test-disk/
    storage.trim_files         true
    storage.checksum on
    storage.max_chunks_up      1
[INPUT]
    refresh_interval 1
    name                       tail
    read_from_head             on
    path                       logs/*.log
    storage.type               filesystem
    buffer_chunk_size          2M
    buffer_max_size            2M
    tag                        <fn>
    tag_regex                  (?<fn>.*)
[FILTER]
    Name modify
    Match *
    Add Service1 SOMEVALUE
    Add Service3 SOMEVALUE3
[OUTPUT]
    name                       http
    match                      *
    format                     json_lines
    host                       127.0.0.1
    port                       8443
    retry_limit                False
    tls                        on
    tls.verify                 off
    workers                    1
    storage.total_limit_size   9100M

Could you please advise if this is expected with this option enabled?

Thanks.

@lecaros
Copy link
Contributor

lecaros commented Sep 4, 2023

ping @edsiper @leonardo-albertovich

@leonardo-albertovich
Copy link
Collaborator Author

@RicardoAAD this should not happen and I have not observed this previously. Please share a copy of those corrupted files with me so I can take a look at them. Meanwhile I'll try to reproduce the issue.

@leonardo-albertovich
Copy link
Collaborator Author

@RicardoAAD I have been running fluent-bit for 10 minutes and didn't see that error once so I think I'll need some input from your side.

@RicardoAAD
Copy link
Collaborator

Thanks @leonardo-albertovich, Please let us know if you need any additional information from the repro that we showed you today.

Regards,

This bug had already been fixed in upstream, the round operation causes
the mapping size (and thus alloc size) to be larger than the file size
which means there is a memory area that's not backed by the file which
in some cases such as XFS causes data loss.

Signed-off-by: Leonardo Alminana <[email protected]>
@leonardo-albertovich
Copy link
Collaborator Author

@RicardoAAD could you please re-test?

@RicardoAAD
Copy link
Collaborator

Hi @leonardo-albertovich, I tested the new changes, and there are no issues with the checksum now.

Thanks.

@edsiper edsiper merged commit 4c7f71a into tiger-1.8.15 Sep 12, 2023
3 checks passed
@edsiper edsiper deleted the leonardo-tiger-file-trimming-improvement branch September 12, 2023 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants