-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add a DEB integration test #4301
Conversation
This pull request does not have a backport label. Could you fix it @leehinman? 🙏
NOTE: |
c9465c0
to
c3bb8a1
Compare
44b4692
to
fe02cc3
Compare
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
483028d
to
25002c5
Compare
5c4a06c
to
93ea3e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, couple questions before approving.
Looking at the test time using a sample size of 1:
- This PR's integration test step took 2h11 from https://buildkite.com/elastic/elastic-agent/builds/7435#018dd27c-0e2a-47c9-8e70-14b8a46d9677
- The most recent success on main took 1h52m from https://buildkite.com/elastic/elastic-agent/builds/7437#018dd291-acc2-46e4-a2ed-ec33f9c51afb
That's an additional 19 minutes which is a sizeable increase. I am wondering if we can shave some of that off by not building the packages we don't need yet, namely rpm
and docker
. At the same time, we need them eventually.
I have a strong suspicion the docker packages cost the most time, because it builds several container variants for each architecture, and even if we were testing it we'd only need one of them.
@@ -19,7 +19,7 @@ else | |||
OVERRIDE_TEST_AGENT_VERSION="" | |||
fi | |||
# PACKAGE | |||
AGENT_PACKAGE_VERSION="${OVERRIDE_AGENT_PACKAGE_VERSION}" DEV=true EXTERNAL=true SNAPSHOT=true PLATFORMS=linux/amd64,linux/arm64,windows/amd64 PACKAGES=tar.gz,zip mage package | |||
AGENT_PACKAGE_VERSION="${OVERRIDE_AGENT_PACKAGE_VERSION}" DEV=true EXTERNAL=true SNAPSHOT=true PLATFORMS=linux/amd64,linux/arm64,windows/amd64 mage package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By removing PACKAGES, I think this also builds all the Docker container variants we don't need. Not wrong, but slower than it could be.
Does adding deb,rpm
to the packages list or does it cause problems with windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it faster if we just build Windows with a separate invocation of mage package
? I'm not sure if the underlying cross-compilation is actually parallelized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also make the windows target smart enough to ignore the deb and rpm packages if that was in fact the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot to add the comment on this. I tried with PACKAGES="tar.gz,zip,deb,rpm"
and it failed to build in CI, "compiled mage file failed to run" if I remember correctly. So I think we have a bug in the mage file, I'm still looking into it. If I don't find it today or tomorrow I'll open up an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, not opposed to optimizing this separate from merging this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I finally found the timing issue in the magefile for building with the multiple package types defined. We tried build all the package types at the same time. So if you have "zip" and the timing is right you can have a compiled magefile that isn't for the GOOS you want. The "fix" will do the GOOS-ARCH in parallel, so all linux-amd64, then all linux-arm, then windows-amd64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly the timing vs main is still exactly the same as what it was before (caveat of sample size of 1 still applies):
- 1h52m for a recent build on main https://buildkite.com/elastic/elastic-agent/builds/7645
- 2h11m for the most recent build here https://buildkite.com/elastic/elastic-agent/builds/7624
Blake's suggestion for the future in #4301 (comment) might be a way to reclaim some of this time but not uploading an unnecessary .tar.gz package. Probably worth putting that in an issue so it isn't lost in this PR.
} | ||
|
||
// apt install doesn't enroll, so need to do that | ||
enrollArgs := []string{"elastic-agent", "enroll"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this affected by #4084? Or are we just logging errors and not actually changing the exit code of the command?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am interested to know why this doesn't doesn't catch that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'm following. If you run dpkg -i elastic-agent-8.13.0-SNAPSHOT-arm64.deb
, to "install" the deb package, there is no place to add the enroll token. So that is what is happening here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right you need to run elastic-agent enroll
. The "bug" in #4084 is that if you do this right after dpkg -i
it fails with an error because it can't restart the agent, because the service isn't running yet.
I now see you don't hit this problem because you systemctl start elastic-agent
first. This is not what our default instructions for installing the deb suggest however, although it is a totally reasonable thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add systemctl start elastic-agent
to the DEB postinst script. Any reason not to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would prevent having elastic-agent enroll --delay-enroll
work reliably now that I think about it a bit.
The use case would be to install the deb in a VM image and set it to enrol when it next boots up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -143,6 +211,7 @@ func testMonitoringLogsAreShipped( | |||
"add_cloud_metadata: received error failed requesting openstack metadata: Get \\\"https://169.254.169.254/2009-04-04/meta-data/instance-type\\\": dial tcp 169.254.169.254:443: connect: connection refused", // okay for the cloud metadata to not work | |||
"add_cloud_metadata: received error failed with http status code 404", // okay for the cloud metadata to not work | |||
"add_cloud_metadata: received error failed fetching EC2 Identity Document: operation error ec2imds: GetInstanceIdentityDocument, http response error StatusCode: 404, request to EC2 IMDS failed", // okay for the cloud metadata to not work | |||
"failed to invoke rollback watcher: failed to start Upgrade Watcher: fork/exec /var/lib/elastic-agent/elastic-agent: no such file or directory", //on debian this happens probably need to fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this only happens for the DEB? If we are the DEB we probably shouldn't even be launching the watcher, or I suppose we could fix the path.
Either way this is worth a follow up issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -19,7 +19,7 @@ else | |||
OVERRIDE_TEST_AGENT_VERSION="" | |||
fi | |||
# PACKAGE | |||
AGENT_PACKAGE_VERSION="${OVERRIDE_AGENT_PACKAGE_VERSION}" DEV=true EXTERNAL=true SNAPSHOT=true PLATFORMS=linux/amd64,linux/arm64,windows/amd64 PACKAGES=tar.gz,zip mage package | |||
AGENT_PACKAGE_VERSION="${OVERRIDE_AGENT_PACKAGE_VERSION}" DEV=true EXTERNAL=true SNAPSHOT=true PLATFORMS=linux/amd64,linux/arm64,windows/amd64 mage package |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
// name. This policy does not contain any integration. | ||
t.Log("Enrolling agent in Fleet with a test policy") | ||
createPolicyReq := kibana.AgentPolicy{ | ||
Name: fmt.Sprintf("test-policy-enroll-%d", time.Now().Unix()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[NIT]
IIRC, most tests do so by adding a uuid to the policy name.
gtesting "testing" | ||
) | ||
|
||
func TestGetPackageSuffix(t *gtesting.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, I have just a small doubt about defining a new Deb group (this entails provisioning an extra runner) as I think we could put the test into the default group and skip it on non-debian OSes
@@ -30,4 +30,7 @@ const ( | |||
|
|||
// Upgrade group of tests. Used for testing upgrades. | |||
Upgrade = "upgrade" | |||
|
|||
// Deb group of tests. Used for testing .deb packages install & upgrades | |||
Deb = "deb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we define a separate group here? we could run the deb tests on any linux machine so I would be inclined to add them to default/other existing group (we save spawning one extra runner too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we define a separate group here? we could run the deb tests on any linux machine so I would be inclined to add them to default/other existing group (we save spawning one extra runner too)
Abundance of caution and me not knowing all the internals of the testing suite :-). I'll drop this. FYI we will have to add Rocky or Alma linux as target distros for the RPM tests. We really need to test DEB & RPM in their native environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we don't necessarily need to drop it, putting it into the default group at the moment would save us the overhead. When we add the other runners we may want to re-assess the grouping and see what we can do.
I think ubuntu for the moment should do as a distro, maybe later we want to have proper debian server or other distros that we want to focus our testing on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't drop this. I think the isolation tests between deb/rpm's and install
command installations are good to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchila & @blakerouse are you OK if we keep the "Deb" group for now, so we can get some deb testing started. And we come to a consensus when I work on the RPM test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good for me 👍
@@ -143,6 +211,7 @@ func testMonitoringLogsAreShipped( | |||
"add_cloud_metadata: received error failed requesting openstack metadata: Get \\\"https://169.254.169.254/2009-04-04/meta-data/instance-type\\\": dial tcp 169.254.169.254:443: connect: connection refused", // okay for the cloud metadata to not work | |||
"add_cloud_metadata: received error failed with http status code 404", // okay for the cloud metadata to not work | |||
"add_cloud_metadata: received error failed fetching EC2 Identity Document: operation error ec2imds: GetInstanceIdentityDocument, http response error StatusCode: 404, request to EC2 IMDS failed", // okay for the cloud metadata to not work | |||
"failed to invoke rollback watcher: failed to start Upgrade Watcher: fork/exec /var/lib/elastic-agent/elastic-agent: no such file or directory", //on debian this happens probably need to fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
93ea3e1
to
c8a865f
Compare
@@ -30,4 +30,7 @@ const ( | |||
|
|||
// Upgrade group of tests. Used for testing upgrades. | |||
Upgrade = "upgrade" | |||
|
|||
// Deb group of tests. Used for testing .deb packages install & upgrades | |||
Deb = "deb" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't drop this. I think the isolation tests between deb/rpm's and install
command installations are good to have.
}, | ||
Local: false, | ||
Sudo: true, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not saying do it in this PR, but another way of handling the different between deb/RPM vs install
subcommand would be to define it in the test requirements. An example could be to change to:
info := define.Require(t, define.Requirements{
Group: Deb,
Stack: &define.Stack{},
Packaging: define.Deb,
Local: false,
Sudo: true,
})
The testing framework could then look at the Packaging: define.Deb
and determine automatically that this needs to be on an OS that supports deb's. Each OS can say what it supports in its definition. Then when running the framework with MATRIX="true"
it could run on Ubuntu, Debian, etc. By default it would just pick Ubuntu. Doing it this way would also allow only the deb
to be copied to the host. Currently with this change, it is going to copy both the targz
and deb
to the host, when both are not needed.
a2af0fd
to
bce8d27
Compare
Quality Gate passedKudos, no new issues were introduced! 0 New issues |
@blakerouse @pchila @cmacknz are we "OK" to merge or anything else we have to address for this PR? We absolutely have changes that need to happen before adding RPM. |
I am fine to defer future improvements to other PRs and issues. The only thing I noticed after approving is that #4334 was merged before this. We could add a check that the files were actually deleting after running |
This reverts commit 4fd31f1. The Serverless Beats Tests step in our daily builds started failing with: > Error: error running test: missing required Elastic Agent package builds for integration runner to execute: /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-linux-x86_64.tar.gz, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-linux-x86_64.tar.gz.sha512, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-x86_64.rpm, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-x86_64.rpm.sha512, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-amd64.deb, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-amd64.deb.sha512 ~/builds/bk-agent-prod-gcp-1710406561496364062/elastic/elastic-agent The last successful build is https://buildkite.com/elastic/elastic-agent/builds/7707 The first failure is https://buildkite.com/elastic/elastic-agent/builds/7735 There were only 2 changes made between these, this change is one of them. Let's try to revert this first because it has the least impact.
What does this PR do?
Add an integration test that installs elastic-agent via the DEB package and makes sure that logs are sent to Elasticsearch.
Why is it important?
Need to test that DEB packages work.
Checklist
- [ ] I have added an entry in./changelog/fragments
using the changelog toolAuthor's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs
Questions to ask yourself