-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Integration Test] New Agent after upgrade fails to start #3085
[Integration Test] New Agent after upgrade fails to start #3085
Conversation
🌐 Coverage report
|
The test in this PR is currently failing. That's because the Upgrade Watcher process seems to get killed off prematurely by the systemd service when it attempts to restart the failing fake Agent process. You can see this in action by starting the test, then SSH'ing into the VM where the test is running, and running the following command after the test attempts the upgrade.
For about a minute and a half, the above command will show that the fake Agent process is failing with a non-zero status code. It will also show the Upgrade Watcher process running.
Then, after about a minute and a half, systemd will attempt to restart the fake Agent process. You can tell this by seeing the
As an aside (likely a separate bug), looking through the Upgrade Watcher's log, I don't think the Crash Checker is retrieving the correct PID for the Agent process from systemd. Note the
|
Filed issues for both (potential) bugs mentioned in #3085 (comment): |
This pull request is now in conflicts. Could you fix it? 🙏
|
e3d87a4
to
1d39efd
Compare
Currently blocked on #3274 |
1d39efd
to
af0d877
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
640711c
to
4304b73
Compare
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
|
||
t.Logf("Restarting Agent via service to simulate crashing") | ||
err = install.RestartService(topPath) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I really like the simplicity of making it think that something is wrong with it.
This pull request is now in conflicts. Could you fix it? 🙏
|
6faf005
to
96dca69
Compare
a6fcd66
to
eadf520
Compare
SonarQube Quality Gate 0 Bugs No Coverage information |
* Adding skeleton * WIP * Finish writing test * Fixing upgrade checker completion check function name * Fix client.Upgrade() call * Fix other errors * Increasing context timeout * Trying a different way of crashing the upgraded Agent * Trying yet another (much simpler!) way to crash upgraded Agent * Remove OS restriction * Use newer checks * Fix expected versions in checks * Fix changed function name * Excluding generated mocks from code coverage analysis (cherry picked from commit 99bc6d7) # Conflicts: # sonar-project.properties
* Adding skeleton * WIP * Finish writing test * Fixing upgrade checker completion check function name * Fix client.Upgrade() call * Fix other errors * Increasing context timeout * Trying a different way of crashing the upgraded Agent * Trying yet another (much simpler!) way to crash upgraded Agent * Remove OS restriction * Use newer checks * Fix expected versions in checks * Fix changed function name * Excluding generated mocks from code coverage analysis (cherry picked from commit 99bc6d7) # Conflicts: # sonar-project.properties
* Adding skeleton * WIP * Finish writing test * Fixing upgrade checker completion check function name * Fix client.Upgrade() call * Fix other errors * Increasing context timeout * Trying a different way of crashing the upgraded Agent * Trying yet another (much simpler!) way to crash upgraded Agent * Remove OS restriction * Use newer checks * Fix expected versions in checks * Fix changed function name * Excluding generated mocks from code coverage analysis (cherry picked from commit 99bc6d7) # Conflicts: # sonar-project.properties
What does this PR do?
Relates #2176
This PR adds an integration test that ensures an Agent upgrade is rolled back because the new Agent (after upgrade) fails to start.
The test creates a fake Agent package containing a fake Agent binary that immediately exits with a non-zero status code. The test installs the locally-built Agent and then attempts to upgrade to the fake Agent package. The upgrade is expected to fail due to the new (fake) Agent binary failing to start successfully, and therefore the upgrade should be rolled back to the previous Agent version.
Why is it important?
To ensure the upgrade/rollback behavior is working as expected.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesI have added tests that prove my fix is effective or that my feature worksI have added an entry in./changelog/fragments
using the changelog toolHow to test this PR locally
Build Agent locally.
Run the test in this PR.
Related issues