Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Start/stop monitoring server based on monitoring config #3492

Merged

Conversation

michalpristas
Copy link
Contributor

@michalpristas michalpristas commented Oct 2, 2023

What does this PR do?

Adds a capability to turn off monitoring server in case monitoring metrics is disabled.

Why is it important?

Up until now monitoring server was always turned on listening on a monitoring port even though monitoring was not enabled.

TODO: check with cloud they have monitoring enabled in a config

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test

  • install standalone agent
  • play with agent.monitoring.enabled and agent.monitoring.metrics flags
  • any of them set to false should stop server
  • verify server is in desired state running sudo lsof -i :6791. for server stopped no output, for server running LISTEN and metricbeat connecting to it

Fixes #2734

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 2, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-10-06T13:07:12.343+0000

  • Duration: 40 min 23 sec

Test stats 🧪

Test Results
Failed 0
Passed 6521
Skipped 59
Total 6580

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@michalpristas
Copy link
Contributor Author

seems cloud does not disable monitoring

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 2, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.795% (82/83) 👍 0.015
Files 67.114% (200/298) 👍 0.111
Classes 65.766% (365/555) 👍 0.124
Methods 52.99% (1152/2174) 👍 0.013
Lines 38.535% (13133/34081) 👎 -0.018
Conditionals 100.0% (0/0) 💚

@michalpristas michalpristas added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team and removed needs_team labels Oct 4, 2023
@michalpristas michalpristas marked this pull request as ready for review October 4, 2023 09:06
@michalpristas michalpristas requested a review from a team as a code owner October 4, 2023 09:06
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

# - security: impacts on the security of a product or a user’s deployment.
# - upgrade: important information for someone upgrading from a prior version
# - other: does not fit into any of the other categories
kind: feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think strictly speaking this should be an enhancement since it's not completely new functionality :)

coord.runLoopIteration(ctx)
assert.True(t, cfgChange.acked, "Coordinator should ACK a successful policy change")

// server is started by default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be changed?

coord.runLoopIteration(ctx)
assert.True(t, cfgChange.acked, "Coordinator should ACK a successful policy change")

// server is started by default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be changed since we are explicitly enabling monitoring in the policy now (as opposed to it being enabled by default)?

coord.runLoopIteration(ctx)
assert.True(t, cfgChange.acked, "Coordinator should ACK a successful policy change")

// server is started by default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment should be changed?

@ycombinator
Copy link
Contributor

ycombinator commented Oct 4, 2023

Just tested this PR and I'm not sure it's working as expected. Or maybe I'm doing something wrong in my test?

$ sudo elastic-agent version    # Agent is not installed yet
sudo: elastic-agent: command not found
$ sudo lsof -i:6789 -P    # no output, since Agent is not installed 👍 
$ cat elastic-agent.yml| yq '."agent.monitoring".enabled'
false
$ sudo ./elastic-agent install -n
Installing in non-interactive mode.
Copying files....................................................................... DONE
Installing service... DONE
Starting service... DONE
Elastic Agent has been successfully installed.
$ sudo elastic-agent version
Binary: 8.11.0-SNAPSHOT (build: 37186bde9d953e2f4719dc46e4024fb267fa6deb at 2023-10-04 12:44:37 +0000 UTC)
Daemon: 8.11.0-SNAPSHOT (build: 37186bde9d953e2f4719dc46e4024fb267fa6deb at 2023-10-04 12:44:37 +0000 UTC)
$ sudo lsof -i:6789 -P    # was expecting this to have no output
COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
elastic-a 44795 root   11u  IPv4 0xaa050607afc9d715      0t0  TCP localhost:6789 (LISTEN)
elastic-a 44795 root   15u  IPv4 0xaa050607b1e76da5      0t0  TCP localhost:6789->localhost:62110 (ESTABLISHED)
metricbea 44803 root   14u  IPv4 0xaa050607b1e778ed      0t0  TCP localhost:62110->localhost:6789 (ESTABLISHED)

@michalpristas
Copy link
Contributor Author

michalpristas commented Oct 5, 2023

@ycombinator have you changed the config? by default monitoring is enabled
disabling monitoring by default i consider breaking

edit: i see you change it upfront.

@ycombinator
Copy link
Contributor

@ycombinator have you changed the config? by default monitoring is enabled disabling monitoring by default i consider breaking

Yes, I changed the config to explicitly disable monitoring. See the third command in my test output.

@michalpristas
Copy link
Contributor Author

can you check indentation or if value is not commented out in a config. i cannot repro

@ycombinator
Copy link
Contributor

ycombinator commented Oct 5, 2023

can you check indentation or if value is not commented out in a config. i cannot repro

I tried and I'm getting the same behavior as before. 😞

$ sudo elastic-agent version
Binary: 8.12.0-SNAPSHOT (build: 3372c5f434f3521c5201975ce37ea40de3974109 at 2023-10-05 21:11:07 +0000 UTC)
Daemon: 8.12.0-SNAPSHOT (build: 3372c5f434f3521c5201975ce37ea40de3974109 at 2023-10-05 21:11:07 +0000 UTC)

$ sudo elastic-agent inspect
agent:
  monitoring:
    enabled: false
inputs:
- data_stream.namespace: default
  id: unique-system-metrics-input
  streams:
  - data_stream.dataset: system.cpu
    metricsets:
    - cpu
  - data_stream.dataset: system.memory
    metricsets:
    - memory
  - data_stream.dataset: system.network
    metricsets:
    - network
  - data_stream.dataset: system.filesystem
    metricsets:
    - filesystem
  type: system/metrics
  use_output: default
outputs:
  default:
    api_key: example-key
    hosts:
    - 127.0.0.1:9200
    type: elasticsearch


$ sudo lsof -i:6789
COMMAND     PID USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
elastic-a 88224 root   15u  IPv4 0xaa050607ba841365      0t0  TCP localhost:smc-https (LISTEN)
elastic-a 88224 root   16u  IPv4 0xaa050607b1e76da5      0t0  TCP localhost:smc-https->localhost:56672 (ESTABLISHED)
metricbea 88232 root   22u  IPv4 0xaa050607b1e74bcd      0t0  TCP localhost:56672->localhost:smc-https (ESTABLISHED)

$ ps u 88224
USER   PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
root 88224   0.0  0.1 409761568  48816   ??  Ss    2:21PM   0:00.51 /usr/local/bin/elastic-agent

$ date
Thu Oct  5 14:25:38 PDT 2023

Can you try with the same configuration as mine? If you still can't reproduce, maybe we should ask someone else to break the tie or we could get on Zoom next week to sort this out (I'm on PTO tomorrow and Monday).

@michalpristas
Copy link
Contributor Author

i see what's wrong lsof -i :6791 should be used.
6789 is a control protocol port

@ycombinator
Copy link
Contributor

i see what's wrong lsof -i :6791 should be used. 6789 is a control protocol port

doh

Yup, that was it! 🤦

$ netstat -an | grep LISTEN | grep 6791
$ sudo elastic-agent version
Binary: 8.12.0-SNAPSHOT (build: 3372c5f434f3521c5201975ce37ea40de3974109 at 2023-10-05 21:11:07 +0000 UTC)
Daemon: 8.12.0-SNAPSHOT (build: 3372c5f434f3521c5201975ce37ea40de3974109 at 2023-10-05 21:11:07 +0000 UTC)

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@elastic-sonarqube
Copy link

@michalpristas michalpristas merged commit 6cc587a into elastic:main Oct 6, 2023
11 checks passed
michalpristas added a commit that referenced this pull request Oct 11, 2023
cmacknz pushed a commit that referenced this pull request Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Elastic Agent always starts http monitoring service even if metrics are disabled
3 participants