Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flattened data_stream.* fields under Elastic-Agent #36607

Closed
wants to merge 2 commits into from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Sep 18, 2023

Proposed commit message

If the data_stream is not flattened, Beats will receive it twice, in the Config.Source and as DataStream, that is the normal behaviour when the Elastic-Agent is managed by Fleet and was causing a duplicated key error.

This commit fixes that by only returning the error if the values in Config.Source and DataStream are different.

The tests now cover this case.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
    - [ ] I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

## Author's Checklist
## How to test this PR locally
## Related issues
Closes elastic/elastic-agent#3191
## Use cases
## Screenshots
## Logs

If the data_stream is not flattened, Beats will receive it twice, in
the Config.Source and as DataStream, that is the normal behaviour when
the Elastic-Agent is managed by Fleet and was causing a duplicated key
error.

This commit fixes that by only returning the error if the values in
Config.Source and DataStream are different.

The tests now cover this case.
@belimawr belimawr requested a review from a team as a code owner September 18, 2023 17:20
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 18, 2023
@belimawr belimawr requested a review from cmacknz September 18, 2023 17:20
@mergify
Copy link
Contributor

mergify bot commented Sep 18, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Sep 18, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 18, 2023
Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the root cause is likely that the Elastic Agent itself doesn't parse the flattened variant into the DataStream protobuf structure.

The point of that DataStream object is so that inputs don't actually have to parse the yaml, it is pre-parsed by the Elastic Agent. Doing this in the Elastic Agent also avoids requiring each input to know about and handle the syntax variations, they should just be able to look at the DataStream object as an abstraction for the YAML.

There should be no difference in the protocol messaging regardless of which syntax is used in the underlying configuration.

x-pack/libbeat/management/managerV2_test.go Outdated Show resolved Hide resolved
Comment on lines +589 to +593
// If the data_stream is not flattened, Beats will receive
// it twice, in the Config.Source and as DataStream, that is the
// normal behaviour when the Elastic-Agent is managed by Fleet.
// So we also add the DataStream here to make sure there will be
// no conflict.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because the DataStream object from the Elastic Agent doesn't parse the flattened variant? There is no reason for these two cases to be different in the protocol.

It's good to defend against this problem, but the protocol should be consistent or other developers are going to have this exact same bug.

I think we should follow up and fix this in the agent as well so that there is no inconsistency here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because the DataStream object from the Elastic Agent doesn't parse the flattened variant?

Yes. If I recall correctly, the Agent parses it directly into the Protobuf type that does not support the flattened syntax.

It's good to defend against this problem, but the protocol should be consistent or other developers are going to have this exact same bug.

I think we should follow up and fix this in the agent as well so that there is no inconsistency here.

Should we then revert the original commit supporting flattened keys on Beats and re-implement it on Elastic-Agent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we then revert the original commit supporting flattened keys on Beats and re-implement it on Elastic-Agent?

Yes, we need to change the agent as well or every input is going to need to handle these two cases which is definitely result in bugs.

@@ -118,15 +118,15 @@ func deDotDataStream(raw dataStreamAndSource) (*proto.DataStream, error) {
return nil, fmt.Errorf("cannot unpack source field into struct: %w", err)
}

if ds.Dataset != "" && tmp.DataStream.Dataset != "" {
if ds.Dataset != "" && tmp.DataStream.Dataset != "" && (ds.Dataset != tmp.DataStream.Dataset) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message here isn't really right anymore, the key is supposed to be in both places. The only error is if they don't match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they're the same it's not a problem, that's how the Elastic-Agent sends the data when the data_stream object is not flattened.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is duplicated key 'datastream.dataset but the key being in both places actually the correct and expected result. The error now is only if they don't match, which the Elastic Agent should guarantee is impossible because the DataStream object is parsed out of the Source on the agent side but being defensive is fine.

@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 18, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-18T17:48:28.210+0000

  • Duration: 63 min 19 sec

Test stats 🧪

Test Results
Failed 0
Passed 8323
Skipped 379
Total 8702

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Sep 18, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-dedot-datastream upstream/fix-dedot-datastream
git merge upstream/main
git push upstream fix-dedot-datastream

@belimawr
Copy link
Contributor Author

Closing as we decided to revert the commit that introduced the bug and re-implement it in Elastic-Agent.

Revert PR: #36610

@belimawr belimawr closed this Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support flattened data_stream fields
3 participants