-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet]: On enrolling RPM and Deb agents, Restarting agent failed
error is displayed in CLI.
#4084
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Secondary Review for this ticket is Done. |
I can reproduce this, suspect another unintended consequence of #3815 where we now always consider a failure to restart with the control socket a fatal error. The agent service isn't automatically started after running ubuntu@valuable-gudgeon:~$ sudo dpkg -i ./elastic-agent-8.12.0-arm64.deb
Selecting previously unselected package elastic-agent.
(Reading database ... 66270 files and directories currently installed.)
Preparing to unpack .../elastic-agent-8.12.0-arm64.deb ...
Unpacking elastic-agent (8.12.0) ...
Setting up elastic-agent (8.12.0) ...
found symlink /usr/share/elastic-agent/bin/elastic-agent, unlink
create symlink /usr/share/elastic-agent/bin/elastic-agent to /var/lib/elastic-agent/data/elastic-agent-5cbf2e/elastic-agent
ubuntu@valuable-gudgeon:~$ sudo systemctl status elastic-agent
○ elastic-agent.service - Agent manages other beats based on configuration provided.
Loaded: loaded (/lib/systemd/system/elastic-agent.service; disabled; vendor preset: enabled)
Active: inactive (dead)
Docs: https://www.elastic.co/beats/elastic-agent
ubuntu@valuable-gudgeon:~$ sudo elastic-agent enroll --url=https://2d8b862d544f4fbca4ff375dfae3b19f.fleet.eastus2.staging.azure.foundit.no:443 --enrollment-token=Qmtvei1vd0JvRFNMYWwxdC04bTU6R3lldEtHc01SYW1iQy1pYU9qOFRsZw==
This will replace your current settings. Do you want to continue? [Y/n]:y
{"log.level":"info","@timestamp":"2024-01-15T11:35:48.449-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":496},"message":"Starting enrollment to URL: https://XXXXX.fleet.eastus2.staging.azure.foundit.no:443/","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-01-15T11:35:49.770-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":461},"message":"Restarting agent daemon, attempt 0","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-01-15T11:35:49.771-0500","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":475},"message":"Restart attempt 0 failed: 'rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial unix /var/lib/elastic-agent/data/tmp/elastic-agent-control.sock: connect: no such file or directory\"'. Waiting for 2s","ecs.version":"1.6.0"} The instructions for enrolling a DEB in Fleet already include manually starting the service already for this reason:
|
I should note that the error here doesn't mean the enrollment failed, enrollment actually succeeded and if you ignore the error and continue with the following the agent successfully connects to Fleet.
|
We should just need to pass the
|
You can also avoid the error by starting the agent service before enrolling. sudo systemctl enable elastic-agent
sudo systemctl start elastic-agent |
An alternative to fixing this in the agent is to change the instructions in Fleet to start the service before enrolling: This is what we have today: curl -L -O https://artifacts.elastic.co/downloads/beats/elastic-agent/elastic-agent-8.12.2-amd64.deb
sudo dpkg -i elastic-agent-8.12.2-amd64.deb
sudo elastic-agent enroll --url=https://XXXXX.fleet.eastus2.staging.azure.foundit.no:443 --enrollment-token=XXXXX
sudo systemctl enable elastic-agent
sudo systemctl start elastic-agent We are also investigating automatically starting the service as part of the deb/rpm installer. |
Hello @cmacknz
While this is true, this has some impact when using automation tools. For example, when using ansible it relies on the exit code of the previous command to know if it can continue to the next task on the playbook or exit with an error, currently the I was helping one of the infra teams in my company write an ansible playbook to deploy the agents and spent a couple of time troubleshooting why it was not working and always failing in the enrollment step. I was only able to fix the playbook because I found this issue and the undocumented flag After that, I tested on another server and using Since the next steps consists in enable the systemd service and start it, we choose to use |
Kibana Build details:
Host OS and Browser version: All, All
Preconditions:
Steps to reproduce:
Restarting agent failed
error is displayed in CLI.What's working fine:
Expected:
On enrolling RPM and Deb agents restarting agent error should not display in CLI.
Screenshot:
The text was updated successfully, but these errors were encountered: