-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade verifiers do not retry if the download fails or times out. #5163
Comments
This Since you could download it manually later, my first thought is this was a transient network error or problem with our artifacts CDN. Is this still happening to your agents? Were you able to download the file while the agent was failing? This may indicate the problem is actually that our download timeout for this file needs to be longer. |
I'm getting the same error as well upgrading from 8.14.1 to 8.14.3. All I did was applied the upgrade again through the Fleet UI.
|
I could indeed download the .asc file manually later while in the upgrade process. I have tried upgrading twice in a row, right after the first failure. The result was the same so my only option was to download it manually while the upgrade process was started to supply for the timeout. You're most likely right and timeout period is too low. For the agent part - I don't have any outdated agent right now I could test this all over again on. |
@cllasyx Would you mind timing your curl command from the same host as before, so we can get a sense of how long it's taking?
Thanks. |
I don't think it matters what the time on this system is, I can see in our code that the .asc download does not share a context timeout with the agent package download and does not have retries. elastic-agent/internal/pkg/agent/application/upgrade/step_download.go Lines 103 to 121 in ca726a2
In the case of the HTTP verifier, we make one attempt to get it with a 30s timeout with no retries which is definitely wrong. 30s is fine for the timeout of an individual request, but we should retry as long as the overall upgrade download timeout is not expired. elastic-agent/internal/pkg/agent/application/upgrade/artifact/download/http/verifier.go Lines 166 to 185 in ca726a2
|
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
Do we agree the problem here was the download of the The PGP key download is not mandatory atm - as it will try anyway to use the one embedded in the binary itself? |
Yes, the problem was definitely the download of the .asc file used for PGP verification. |
The |
I didn't say the |
Hello, I have deployed Elastic Agent with Fleet Server in version 8.14.2 and tried to upgrade few days later to 8.14.3.
When watching the logs through Observability -> Logs -> Stream I have noticed some error messages from elastic_agent dataset. The logs are provided below as well as temporary fix.
Steps to reproduce:
Log output:
Bug fix (manual):
Notes
My Fleet Server host is listening on socket
*:8220
on a domain name https://myfleet.example.com:8220. The host has another socket open127.0.0.1:8221
which is used for internal API operations. My firewall has OUTPUT chain to accept all and INPUT chain has the rule to accept all connections made to loopback adapter as specified in a ruleiptables -A INPUT -i lo -j ACCEPT
.The text was updated successfully, but these errors were encountered: