-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet]: Hosted fleet server gets unhealthy on 8.16 Snapshot. #5615
[Fleet]: Hosted fleet server gets unhealthy on 8.16 Snapshot. #5615
Comments
@amolnater-qasource Kindly review |
Secondary Review for this ticket is Done. |
I'm able to reproduce this simply by creating an
So it seems like the |
Pinged APM server team in Slack for ideas. |
From Slack discussion looks related to the group membership change in the Wolfi container, where the agent user is no long in gid user. The permissions of the apm-server.yml file make it unreadable to the elastic-agent processes:
|
Thanks @cmacknz. I'm guessing the fix here is on the APM Server end where the |
Chatting with the APM Server team in Slack, the file ownership fix needs to be made in the Agent packaging step, specifically here (thanks @kruskall for pinpointing): elastic-agent/dev-tools/packaging/templates/docker/Dockerfile.elastic-agent.tmpl Lines 164 to 167 in 8e31a5d
|
as this PR is merged #4925 for 8.16.0-SNAPSHOT I am wondering if the only thing missing here is giving SETPCAP, and CHOWN permissions to the elastic-agent pod?! |
@pkoutsovasilis Do we need to do this even after #5616 is merged? |
it should be solved when this is merged. But I am wondering if merging it will cause issues on the opposite scenario when somebody runs the image under root without SETPCAP and CHOWN perms 🙂 |
Issue is still present for agent version 8.16.0-SNAPSHOT build_time: 2024-09-27T20:51:03Z
commit: 601aaebb1893f1f470531b3f921dc6fb9d965306
snapshot: true
version: 8.16.0 From the diagnostics I pulled from an 8.16.0-SNAPSHOT in CFT we can still see that apm-server is still looking for {"log.level":"info","@timestamp":"2024-09-30T07:01:42.992Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":632},"message":"Spawned new unit apm-es-containerhost-elastic-cloud-apm: Starting: spawned pid '206'","log":{"source":"elastic-agent"},"component":{"id":"apm-es-containerhost","state":"STARTING"},"unit":{"id":"apm-es-containerhost-elastic-cloud-apm","type":"input","state":"STARTING"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-09-30T07:01:43.089Z","message":"Error: error loading config file: stat apm-server.yml: no such file or directory","component":{"binary":"apm-server","dataset":"elastic_agent.apm_server","id":"apm-es-containerhost","type":"apm"},"log":{"source":"apm-es-containerhost"},"ecs.version":"1.6.0"} |
Rebuilt a new image from the 8.x branch, the ownership of bash-5.2$ ls -la /usr/share/elastic-agent/
total 3904
drwxrwxrwx 1 elastic- elastic- 4096 Sep 30 08:27 .
drwxr-xr-x 1 root root 4096 Sep 30 08:27 ..
-rw-r--r-- 1 elastic- elastic- 41 Sep 30 08:27 .build_hash.txt
-rw-r--r-- 1 elastic- elastic- 41 Sep 30 08:27 .elastic-agent.active.commit
-rw-r--r-- 1 elastic- elastic- 3860 Sep 30 08:27 LICENSE.txt
-rw-r--r-- 1 elastic- elastic- 3906754 Sep 30 08:27 NOTICE.txt
-rw-r--r-- 1 elastic- elastic- 360 Sep 30 08:27 README.md
drwxrwxrwx 1 elastic- elastic- 4096 Sep 30 08:27 data
lrwxrwxrwx 1 elastic- elastic- 64 Sep 30 08:27 elastic-agent -> /usr/share/elastic-agent/data/elastic-agent-110229/elastic-agent
-rw-r--r-- 1 elastic- elastic- 14829 Sep 30 08:27 elastic-agent.reference.yml
-rw-r--r-- 1 elastic- elastic- 10409 Sep 30 08:27 elastic-agent.yml
-rw-r--r-- 1 elastic- elastic- 376 Sep 30 08:27 manifest.yaml
-rw-r--r-- 1 elastic- elastic- 643 Sep 30 08:27 otel.yml
drwxr-xr-x 2 elastic- elastic- 4096 Sep 30 08:27 otel_samples
-rw-r--r-- 1 elastic- elastic- 85 Sep 30 08:27 otelcol bash-5.2$ ls -la /usr/share/elastic-agent/data/elastic-agent-110229/components/
total 1119252
drwxrwxrwx 1 elastic- elastic- 4096 Sep 30 08:27 .
drwxrwxrwx 1 elastic- elastic- 4096 Sep 30 08:27 ..
-rw-rw-rw- 1 elastic- elastic- 41 Sep 30 08:27 .build_hash.txt
-rw-rw-rw- 1 elastic- elastic- 13675 Sep 30 08:27 LICENSE.txt
-rw-rw-rw- 1 elastic- elastic- 377163 Sep 30 08:27 NOTICE.pf-elastic-collector.txt
-rw-rw-rw- 1 elastic- elastic- 549000 Sep 30 08:27 NOTICE.pf-elastic-symbolizer.txt
-rw-rw-rw- 1 elastic- elastic- 997964 Sep 30 08:27 NOTICE.pf-host-agent.txt
-rw-rw-rw- 1 elastic- elastic- 96850 Sep 30 08:27 NOTICE.txt
-rw-rw-rw- 1 elastic- elastic- 851 Sep 30 08:27 README.md
-rwxr-xr-x 1 elastic- elastic- 419405240 Sep 30 08:27 agentbeat
-rw-r--r-- 1 elastic- elastic- 16530 Sep 30 08:27 agentbeat.spec.yml
-rwxr-xr-x 1 elastic- elastic- 56025240 Sep 30 08:27 apm-server
-rw-r--r-- 1 elastic- elastic- 542 Sep 30 08:27 apm-server.spec.yml
-rw-r--r-- 1 elastic- elastic- 39322 Sep 30 08:27 apm-server.yml
-rw-rw-rw- 1 elastic- elastic- 273830 Sep 30 08:27 bundle.tar.gz
drwxrwxrwx 2 elastic- elastic- 4096 Sep 30 08:27 certs
-rw-r--r-- 1 elastic- elastic- 3142 Sep 30 08:27 checksum.yml
-rwxr-xr-x 1 elastic- elastic- 104601169 Sep 30 08:27 cloud-defend
-rw-r--r-- 1 elastic- elastic- 442 Sep 30 08:27 cloud-defend.spec.yml
-rwxr-xr-x 1 elastic- elastic- 252076353 Sep 30 08:27 cloudbeat
-rw-r--r-- 1 elastic- elastic- 2541 Sep 30 08:27 cloudbeat.spec.yml
-rw-r--r-- 1 elastic- elastic- 6920 Sep 30 08:27 cloudbeat.yml
-rwxr-xr-x 1 elastic- elastic- 26417288 Sep 30 08:27 endpoint-security
-rw-rw-rw- 1 elastic- elastic- 26978766 Sep 30 08:27 endpoint-security-resources.zip
-rw-r--r-- 1 elastic- elastic- 3608 Sep 30 08:27 endpoint-security.spec.yml
-rwxr-xr-x 1 elastic- elastic- 38035777 Sep 30 08:27 fleet-server
-rw-r--r-- 1 elastic- elastic- 423 Sep 30 08:27 fleet-server.spec.yml
-rw-rw-rw- 1 elastic- elastic- 8621110 Sep 30 08:27 java-attacher.jar
drwxrwxrwx 2 elastic- elastic- 12288 Sep 30 08:27 lenses
drwxrwxrwx 1 elastic- elastic- 4096 Sep 30 08:27 module
-rwxr-xr-x 1 elastic- elastic- 6788339 Sep 30 08:27 osquery-extension.ext
-rwxr-xr-x 1 elastic- elastic- 86504168 Sep 30 08:27 osqueryd
-rwxr-xr-x 1 elastic- elastic- 20934888 Sep 30 08:27 pf-elastic-collector
-rw-r--r-- 1 elastic- elastic- 283 Sep 30 08:27 pf-elastic-collector.spec.yml
-rwxr-xr-x 1 elastic- elastic- 21367848 Sep 30 08:27 pf-elastic-symbolizer
-rw-r--r-- 1 elastic- elastic- 285 Sep 30 08:27 pf-elastic-symbolizer.spec.yml
-rwxr-xr-x 1 elastic- elastic- 75825808 Sep 30 08:27 pf-host-agent
-rw-r--r-- 1 elastic- elastic- 406 Sep 30 08:27 pf-host-agent.spec.yml
bash-5.2$ whoami
elastic-agent Launching apm-server from bash-5.2$ cd /usr/share/elastic-agent/data/elastic-agent-110229/components/
bash-5.2$ ./apm-server
bash-5.2$ cd -
/usr/share/elastic-agent
bash-5.2$ /usr/share/elastic-agent/data/elastic-agent-110229/components/apm-server
Error: error loading config file: stat apm-server.yml: no such file or directory
Usage:
apm-server [flags]
apm-server [command]
Available Commands:
apikey Manage API Keys for communication between APM agents and server (deprecated)
export Export current config
help Help about any command
keystore Manage secrets keystore
run Run APM Server
test Test config
version Show current version info
Flags:
-E, --E setting=value Configuration overwrite
-N, --N Disable actual publishing for testing
-c, --c string Configuration file, relative to path.config (default "apm-server.yml")
--cpuprofile string Write cpu profile to file
-d, --d stringArray Enable certain debug selectors
-e, --e Log to stderr and disable syslog/file output
--environment string Set the environment in which the process is running (default "default")
-h, --help help for apm-server
--httpprof string Start pprof http server
--memprofile string Write memory profile to this file
--path.config string Configuration path
--path.data string Data path
--path.home string Home path
--path.logs string Logs path
--strict.perms Strict permission checking on config files (default true)
-v, --v Log at INFO level
Use "apm-server [command] --help" for more information about a command.
bash-5.2$ pwd
/usr/share/elastic-agent @kruskall it seems that Edit: It seems like apm-server does not honor the |
After testing with apm-server that included a modified version of beats with additional logging:
The issue seems to have been introduced with commit elastic/apm-server@7f85cd6 on apm-server side ( up to commit elastic/apm-server@46a9a96 apm-server starts correctly under agent) |
Hi Team, We have re-validated this issue on the latest 8.16.0 SNAPSHOT Kibana cloud environment and found it fixed now. Observations:
Build details: Hence, we are marking as QA: Validated. Thanks |
Deployment Links:
Description:
Hosted fleet server gets unhealthy on 8.16 Snapshot and we have observed APM integration shouws error.
Build details:
VERSION: 8.16.0 SNAPSHOT
BUILD: 78494
COMMIT: 156a76cb03e60a89792f905642817405002099a1
Screenshot
The text was updated successfully, but these errors were encountered: