Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.15](backport #4932) Update event logger configuration via Fleet and environment variable (container command) #5109

Merged
merged 3 commits into from
Jul 15, 2024

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Jul 10, 2024

What does this PR do?

If Fleet sends an event logging output configuration different than the one that's running, save it to the encrypted store and re-exec the Elastic-Agent to use the new configuration.

This PR adds the ability to receive event logger configuration via Fleet. Previously only the log level was received via Fleet and persisted.

Fleet can store the logging configuration in the policy via the overrides option, whenever the Elastic-Agent receives the policy (including at startup), it correctly parses this configuration. This PR enable those values to be used to configure the event logger.

When a policy is received the policy handler compares agent.logging.event_data.to_stderr and agent.logging.event_data.to_files with it's current values, if the policy contains different values, they're persisted in the disk store, and the Elastic-Agent re-execs. When it re-starts it reads the new values from the persistent store and applies them.

Note fore reviewers

This PR is built on top of #4909, hence I'm keeping it in draft until #4909 is merged and I can rebase onto main.

Because this PR enables changing the log output (from disk to stderr or vice versa) I believe it's better to just re-start the Elastic-Agent instead of trying to do it at runtime. This will help to keep the logs consistent and avoid any possible race condition or the necessity to lock the logger while stopping/starting new outputs.

Why is it important?

Once #4549 gets merged the event logger won't be configurable via Fleet, this PR enables their configuration via Fleet.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

How to test this PR locally

Test the container command

  1. Get VM and a Stack deployment (ESS is easier) (the deployment needs to be accessible by the VM)
  2. Get mock-es and run it to return StatusNotAcceptable on all requests: mock-es -nonindex 100
  3. Add an output to Fleet pointing to mock-es
  4. Create a log file with a few lines, anything will do, the content is not important as long as there are at least a few lines
  5. Create a Elastic-Agent policy using the output you just created
  6. Add the "Custom Logs Integration" pointing to the file you created
  7. Export the following environment variables, adjust the values to your environment:
    • FLEET_ENROLL=1
    • FLEET_URL=https://fleet-server.elastic.co:443
    • FLEET_ENROLLMENT_TOKEN=c2VjcmV0LXRva2Vu
    • EVENTS_TO_STDERR=true
  8. Run the Elastic-Agent container command: ./elastic-agent container
  9. Assert you can see the dropped event logs on stderr:
    {
      "log.level": "warn",
      "@timestamp": "2024-06-13T17:17:39.365Z",
      "message": "Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Meta:null, Fields:null, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}, EncodedEvent:(*elasticsearch.encodedEvent)(0xc0031fd480)} (status=406): , dropping event!",
      "component": {
        "binary": "filebeat",
        "dataset": "elastic_agent.filebeat",
        "id": "log-mock-es-4ba82ed3_5321_43cc_9ac4_ac2737e02545",
        "type": "log"
      },
      "log": {
        "source": "log-mock-es-4ba82ed3_5321_43cc_9ac4_ac2737e02545"
      },
      "ecs.version": "1.6.0",
      "log.logger": "elasticsearch",
      "log.origin": {
        "file.line": 490,
        "file.name": "elasticsearch/client.go",
        "function": "github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).applyItemStatus"
      },
      "service.name": "filebeat",
      "log.type": "event"
    }

Fleet-Managed Elastic-Agent

  1. Get VM and a Stack deployment (ESS is easier) (the deployment needs to be accessible by the VM)
  2. Get mock-es and run it to return StatusNotAcceptable on all requests: mock-es -nonindex 100
  3. Add an output to Fleet pointing to mock-es
  4. Create a log file with a few lines, anything will do, the content is not important as long as there are at least a few lines
  5. Create a Elastic-Agent policy using the output you just created
  6. Install the Elastic-Agent enrolling in the policy you just created
  7. Add the new logging settings to the policy using the API/DevTools. You'll need the policy ID and policy name:
    PUT kbn:/api/fleet/agent_policies/071c91f3-8fce-480e-b625-381d70f98e53
    {
      "name": "aafoobar",
      "namespace": "default",
      "overrides": {
        "agent": {
          "logging": {
            "event_data": {
              "to_stderr": true,
              "to_files": false
            }
          }
        }
      }
    }
    
  8. Add the "Custom Logs Integration" pointing to the file you created
  9. Assert you can see the dropped event logs on stderr. Run journalctl -lfu elastic-agent.service and ensure you see logs like this:
    {
      "log.level": "warn",
      "@timestamp": "2024-06-13T17:17:39.365Z",
      "message": "Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Meta:null, Fields:null, Private:interface {}(nil), TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:mapstr.M(nil)}, EncodedEvent:(*elasticsearch.encodedEvent)(0xc0031fd480)} (status=406): , dropping event!",
      "component": {
        "binary": "filebeat",
        "dataset": "elastic_agent.filebeat",
        "id": "log-mock-es-4ba82ed3_5321_43cc_9ac4_ac2737e02545",
        "type": "log"
      },
      "log": {
        "source": "log-mock-es-4ba82ed3_5321_43cc_9ac4_ac2737e02545"
      },
      "ecs.version": "1.6.0",
      "log.logger": "elasticsearch",
      "log.origin": {
        "file.line": 490,
        "file.name": "elasticsearch/client.go",
        "function": "github.com/elastic/beats/v7/libbeat/outputs/elasticsearch.(*Client).applyItemStatus"
      },
      "service.name": "filebeat",
      "log.type": "event"
    }

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

Closes #4874


This is an automatic backport of pull request #4932 done by Mergify.

@mergify mergify bot requested a review from a team as a code owner July 10, 2024 19:58
@mergify mergify bot added backport conflicts There is a conflict in the backported pull request labels Jul 10, 2024
@mergify mergify bot requested review from michalpristas and andrzej-stencel and removed request for a team July 10, 2024 19:58
Copy link
Contributor Author

mergify bot commented Jul 10, 2024

Cherry-pick of 72c1ebd has failed:

On branch mergify/bp/8.15/pr-4932
Your branch is up to date with 'origin/8.15'.

You are currently cherry-picking commit 72c1ebddf9.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   NOTICE.txt
	new file:   changelog/fragments/1719345278-container.yaml
	modified:   internal/pkg/agent/application/actions/handlers/handler_action_policy_change.go
	modified:   internal/pkg/agent/application/actions/handlers/handler_action_policy_change_test.go
	modified:   internal/pkg/agent/application/managed_mode.go
	modified:   internal/pkg/agent/cmd/container.go
	modified:   testing/integration/container_cmd_test.go
	new file:   testing/integration/event_logging_test.go
	modified:   testing/integration/logs_ingestion_test.go

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   go.mod
	both modified:   go.sum

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

…(container command) (#4932)

This commit adds the ability to receive event logger output
configuration via Fleet. Previously only the log level was received
via Fleet and persisted.

Fleet can store the event logger configuration in the overrides
section from the policy, allowing users to change and persist this
configuration. The Elastic-Agent needs this configuration at startup,
so whenever the Elastic-Agent receives a new policy from Fleet, it
compares the event logging output configuration with its current one,
if it is different, it is persisted to disk and the Elastic-Agent
re-execs. When it re-starts it reads the new values from the
persistent store and applies them.

---------

Co-authored-by: Pierre HILBERT <[email protected]>
(cherry picked from commit 72c1ebd)
@belimawr belimawr force-pushed the mergify/bp/8.15/pr-4932 branch from a7a6ed4 to 0e8afe7 Compare July 10, 2024 20:18
@cmacknz cmacknz enabled auto-merge (squash) July 10, 2024 21:50
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jul 11, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@cmacknz
Copy link
Member

cmacknz commented Jul 11, 2024

TestEventLogOutputConfiguredViaFleet failed, this was a test added as part of this PR series.

@belimawr
Copy link
Contributor

I've been trying but I cannot manage to reproduce this failure... I'll re-run the job to see what happens.

Copy link
Contributor Author

mergify bot commented Jul 15, 2024

This pull request has not been merged yet. Could you please review and merge it @belimawr? 🙏

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
39.3% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

@cmacknz cmacknz merged commit ced6bc6 into 8.15 Jul 15, 2024
9 of 13 checks passed
@cmacknz cmacknz deleted the mergify/bp/8.15/pr-4932 branch July 15, 2024 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport conflicts There is a conflict in the backported pull request Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants