Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/filelog] fix record counting with header #35870

Merged

Conversation

andrzej-stencel
Copy link
Member

Description

Fixes #35869 by refactoring of the Reader::ReadToEnd method.

This refactors the Reader::ReadToEnd method by separating reading the file's header from reading the file's contents.
This results in very similar code in readHeader and readContents methods, which was previously deduplicated at the cost of slightly higher complexity.
The bug could be fixed without separating header reading from contents reading, but I hope this separation will make it easier to implement content batching in the Reader (#35455).
Content batching was my original motivation for these code changes.
I only discovered the problem with record counting when reading the code.

Link to tracking issue

Fixes #35869

Testing

In the first commit I have added tests that document the erroneous behavior. In the second commit I have fixed the bug and corrected the tests.

Documentation

Added changelog entry.

This refactors the `Reader::ReadToEnd` method
by separating reading the file's header from reading the file's contents.
This results in very similar code in `readHeader` and `readContents` methods,
which was previosly deduplicated at the cost of slightly higher complexity.
The bug could be fixed without separating header reading from contents reading,
but I hope this separation will make it easier to implement content batching
in the Reader (open-telemetry#35455).
Content batching was my original motivation for these code changes.
I only discovered the problem with record counting when reading the code.

// Switch to the normal split and process functions.
r.splitFunc = r.lineSplitFunc
r.processFunc = r.emitFunc
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "juggling" of function references could probably be avoided altogether after the refactoring.

The feature gate `AllowHeaderMetadataParsing` is currently in Beta stage,
which means it is enabled by default.
Some tests were still explicitly enabling the feature gate on test start
and disabling it on test end,
which would cause other tests to fail
that expected the feature gate to be enabled.
@djaglowski djaglowski merged commit ed302a3 into open-telemetry:main Oct 28, 2024
158 checks passed
@github-actions github-actions bot added this to the next release milestone Oct 28, 2024
jpbarto pushed a commit to jpbarto/opentelemetry-collector-contrib that referenced this pull request Oct 29, 2024
)

#### Description

Fixes
open-telemetry#35869
by refactoring of the `Reader::ReadToEnd` method.

This refactors the `Reader::ReadToEnd` method by separating reading the
file's header from reading the file's contents.
This results in very similar code in `readHeader` and `readContents`
methods, which was previously deduplicated at the cost of slightly
higher complexity.
The bug could be fixed without separating header reading from contents
reading, but I hope this separation will make it easier to implement
content batching in the Reader
(open-telemetry#35455).
Content batching was my original motivation for these code changes.
I only discovered the problem with record counting when reading the
code.

#### Link to tracking issue

Fixes
open-telemetry#35869

#### Testing

In the first commit I have added tests that document the erroneous
behavior. In the second commit I have fixed the bug and corrected the
tests.

#### Documentation

Added changelog entry.
@andrzej-stencel andrzej-stencel deleted the refactor-reader-header-reading branch October 30, 2024 08:28
andrzej-stencel added a commit to andrzej-stencel/opentelemetry-collector-contrib that referenced this pull request Oct 31, 2024
Separates header processing functions from content processing functions to improve code clarity.
This is a follow-up from open-telemetry#35870 (comment).
andrzej-stencel added a commit to andrzej-stencel/opentelemetry-collector-contrib that referenced this pull request Oct 31, 2024
Separates header processing functions from content processing functions to improve code clarity.
This is a follow-up from open-telemetry#35870 (comment).
andrzej-stencel added a commit to andrzej-stencel/opentelemetry-collector-contrib that referenced this pull request Oct 31, 2024
Separates header processing functions from content processing functions to improve code clarity.
This is a follow-up from open-telemetry#35870 (comment).
djaglowski pushed a commit that referenced this pull request Oct 31, 2024
Separates header processing functions from content processing functions
to improve code clarity.
This is a follow-up from
#35870 (comment).
ArthurSens pushed a commit to ArthurSens/opentelemetry-collector-contrib that referenced this pull request Nov 4, 2024
…y#36108)

Separates header processing functions from content processing functions
to improve code clarity.
This is a follow-up from
open-telemetry#35870 (comment).
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
)

#### Description

Fixes
open-telemetry#35869
by refactoring of the `Reader::ReadToEnd` method.

This refactors the `Reader::ReadToEnd` method by separating reading the
file's header from reading the file's contents.
This results in very similar code in `readHeader` and `readContents`
methods, which was previously deduplicated at the cost of slightly
higher complexity.
The bug could be fixed without separating header reading from contents
reading, but I hope this separation will make it easier to implement
content batching in the Reader
(open-telemetry#35455).
Content batching was my original motivation for these code changes.
I only discovered the problem with record counting when reading the
code.

#### Link to tracking issue

Fixes
open-telemetry#35869

#### Testing

In the first commit I have added tests that document the erroneous
behavior. In the second commit I have fixed the bug and corrected the
tests.

#### Documentation

Added changelog entry.
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…y#36108)

Separates header processing functions from content processing functions
to improve code clarity.
This is a follow-up from
open-telemetry#35870 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[receiver/filelog] record numbers are incorrect when header is configured
3 participants