Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a DEB integration test #4301

Merged
merged 3 commits into from
Mar 12, 2024
Merged

Conversation

leehinman
Copy link
Contributor

What does this PR do?

Add an integration test that installs elastic-agent via the DEB package and makes sure that logs are sent to Elasticsearch.

Why is it important?

Need to test that DEB packages work.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
    - [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Author's Checklist

  • [ ]

How to test this PR locally

mage integration:single TestDebLogIngestFleetManaged

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

Copy link
Contributor

mergify bot commented Feb 21, 2024

This pull request does not have a backport label. Could you fix it @leehinman? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@leehinman leehinman force-pushed the 4086_integration_deb_rpm branch from c9465c0 to c3bb8a1 Compare February 21, 2024 04:29
@leehinman leehinman force-pushed the 4086_integration_deb_rpm branch 3 times, most recently from 44b4692 to fe02cc3 Compare February 21, 2024 04:43
@leehinman leehinman marked this pull request as ready for review February 21, 2024 04:51
@leehinman leehinman requested a review from a team as a code owner February 21, 2024 04:51
@leehinman leehinman added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Feb 21, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@leehinman leehinman force-pushed the 4086_integration_deb_rpm branch 2 times, most recently from 483028d to 25002c5 Compare February 21, 2024 20:47
@cmacknz cmacknz requested a review from pchila February 21, 2024 20:55
@leehinman leehinman force-pushed the 4086_integration_deb_rpm branch 5 times, most recently from 5c4a06c to 93ea3e1 Compare February 22, 2024 20:21
Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, couple questions before approving.

Looking at the test time using a sample size of 1:

That's an additional 19 minutes which is a sizeable increase. I am wondering if we can shave some of that off by not building the packages we don't need yet, namely rpm and docker. At the same time, we need them eventually.

I have a strong suspicion the docker packages cost the most time, because it builds several container variants for each architecture, and even if we were testing it we'd only need one of them.

@@ -19,7 +19,7 @@ else
OVERRIDE_TEST_AGENT_VERSION=""
fi
# PACKAGE
AGENT_PACKAGE_VERSION="${OVERRIDE_AGENT_PACKAGE_VERSION}" DEV=true EXTERNAL=true SNAPSHOT=true PLATFORMS=linux/amd64,linux/arm64,windows/amd64 PACKAGES=tar.gz,zip mage package
AGENT_PACKAGE_VERSION="${OVERRIDE_AGENT_PACKAGE_VERSION}" DEV=true EXTERNAL=true SNAPSHOT=true PLATFORMS=linux/amd64,linux/arm64,windows/amd64 mage package
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing PACKAGES, I think this also builds all the Docker container variants we don't need. Not wrong, but slower than it could be.

Does adding deb,rpm to the packages list or does it cause problems with windows?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it faster if we just build Windows with a separate invocation of mage package? I'm not sure if the underlying cross-compilation is actually parallelized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make the windows target smart enough to ignore the deb and rpm packages if that was in fact the problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to add the comment on this. I tried with PACKAGES="tar.gz,zip,deb,rpm" and it failed to build in CI, "compiled mage file failed to run" if I remember correctly. So I think we have a bug in the mage file, I'm still looking into it. If I don't find it today or tomorrow I'll open up an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, not opposed to optimizing this separate from merging this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I finally found the timing issue in the magefile for building with the multiple package types defined. We tried build all the package types at the same time. So if you have "zip" and the timing is right you can have a compiled magefile that isn't for the GOOS you want. The "fix" will do the GOOS-ARCH in parallel, so all linux-amd64, then all linux-arm, then windows-amd64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly the timing vs main is still exactly the same as what it was before (caveat of sample size of 1 still applies):

Blake's suggestion for the future in #4301 (comment) might be a way to reclaim some of this time but not uploading an unnecessary .tar.gz package. Probably worth putting that in an issue so it isn't lost in this PR.

}

// apt install doesn't enroll, so need to do that
enrollArgs := []string{"elastic-agent", "enroll"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this affected by #4084? Or are we just logging errors and not actually changing the exit code of the command?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested to know why this doesn't doesn't catch that problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'm following. If you run dpkg -i elastic-agent-8.13.0-SNAPSHOT-arm64.deb, to "install" the deb package, there is no place to add the enroll token. So that is what is happening here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right you need to run elastic-agent enroll. The "bug" in #4084 is that if you do this right after dpkg -i it fails with an error because it can't restart the agent, because the service isn't running yet.

I now see you don't hit this problem because you systemctl start elastic-agent first. This is not what our default instructions for installing the deb suggest however, although it is a totally reasonable thing to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add systemctl start elastic-agent to the DEB postinst script. Any reason not to do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would prevent having elastic-agent enroll --delay-enroll work reliably now that I think about it a bit.

The use case would be to install the deb in a VM image and set it to enrol when it next boots up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires it's own issue to discuss I added #4381 to do that. I think we can just leave this as-is for this PR, and adjust when we get #4381 figured out.

pkg/testing/fixture_install.go Show resolved Hide resolved
@@ -143,6 +211,7 @@ func testMonitoringLogsAreShipped(
"add_cloud_metadata: received error failed requesting openstack metadata: Get \\\"https://169.254.169.254/2009-04-04/meta-data/instance-type\\\": dial tcp 169.254.169.254:443: connect: connection refused", // okay for the cloud metadata to not work
"add_cloud_metadata: received error failed with http status code 404", // okay for the cloud metadata to not work
"add_cloud_metadata: received error failed fetching EC2 Identity Document: operation error ec2imds: GetInstanceIdentityDocument, http response error StatusCode: 404, request to EC2 IMDS failed", // okay for the cloud metadata to not work
"failed to invoke rollback watcher: failed to start Upgrade Watcher: fork/exec /var/lib/elastic-agent/elastic-agent: no such file or directory", //on debian this happens probably need to fix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this only happens for the DEB? If we are the DEB we probably shouldn't even be launching the watcher, or I suppose we could fix the path.

Either way this is worth a follow up issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -19,7 +19,7 @@ else
OVERRIDE_TEST_AGENT_VERSION=""
fi
# PACKAGE
AGENT_PACKAGE_VERSION="${OVERRIDE_AGENT_PACKAGE_VERSION}" DEV=true EXTERNAL=true SNAPSHOT=true PLATFORMS=linux/amd64,linux/arm64,windows/amd64 PACKAGES=tar.gz,zip mage package
AGENT_PACKAGE_VERSION="${OVERRIDE_AGENT_PACKAGE_VERSION}" DEV=true EXTERNAL=true SNAPSHOT=true PLATFORMS=linux/amd64,linux/arm64,windows/amd64 mage package
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

// name. This policy does not contain any integration.
t.Log("Enrolling agent in Fleet with a test policy")
createPolicyReq := kibana.AgentPolicy{
Name: fmt.Sprintf("test-policy-enroll-%d", time.Now().Unix()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[NIT]
IIRC, most tests do so by adding a uuid to the policy name.

gtesting "testing"
)

func TestGetPackageSuffix(t *gtesting.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, I have just a small doubt about defining a new Deb group (this entails provisioning an extra runner) as I think we could put the test into the default group and skip it on non-debian OSes

@@ -30,4 +30,7 @@ const (

// Upgrade group of tests. Used for testing upgrades.
Upgrade = "upgrade"

// Deb group of tests. Used for testing .deb packages install & upgrades
Deb = "deb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we define a separate group here? we could run the deb tests on any linux machine so I would be inclined to add them to default/other existing group (we save spawning one extra runner too)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we define a separate group here? we could run the deb tests on any linux machine so I would be inclined to add them to default/other existing group (we save spawning one extra runner too)

Abundance of caution and me not knowing all the internals of the testing suite :-). I'll drop this. FYI we will have to add Rocky or Alma linux as target distros for the RPM tests. We really need to test DEB & RPM in their native environments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we don't necessarily need to drop it, putting it into the default group at the moment would save us the overhead. When we add the other runners we may want to re-assess the grouping and see what we can do.
I think ubuntu for the moment should do as a distro, maybe later we want to have proper debian server or other distros that we want to focus our testing on)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't drop this. I think the isolation tests between deb/rpm's and install command installations are good to have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pchila & @blakerouse are you OK if we keep the "Deb" group for now, so we can get some deb testing started. And we come to a consensus when I work on the RPM test cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good for me 👍

@@ -143,6 +211,7 @@ func testMonitoringLogsAreShipped(
"add_cloud_metadata: received error failed requesting openstack metadata: Get \\\"https://169.254.169.254/2009-04-04/meta-data/instance-type\\\": dial tcp 169.254.169.254:443: connect: connection refused", // okay for the cloud metadata to not work
"add_cloud_metadata: received error failed with http status code 404", // okay for the cloud metadata to not work
"add_cloud_metadata: received error failed fetching EC2 Identity Document: operation error ec2imds: GetInstanceIdentityDocument, http response error StatusCode: 404, request to EC2 IMDS failed", // okay for the cloud metadata to not work
"failed to invoke rollback watcher: failed to start Upgrade Watcher: fork/exec /var/lib/elastic-agent/elastic-agent: no such file or directory", //on debian this happens probably need to fix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@leehinman leehinman force-pushed the 4086_integration_deb_rpm branch from 93ea3e1 to c8a865f Compare March 7, 2024 01:07
@@ -30,4 +30,7 @@ const (

// Upgrade group of tests. Used for testing upgrades.
Upgrade = "upgrade"

// Deb group of tests. Used for testing .deb packages install & upgrades
Deb = "deb"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't drop this. I think the isolation tests between deb/rpm's and install command installations are good to have.

},
Local: false,
Sudo: true,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not saying do it in this PR, but another way of handling the different between deb/RPM vs install subcommand would be to define it in the test requirements. An example could be to change to:

	info := define.Require(t, define.Requirements{
		Group: Deb,
		Stack: &define.Stack{},
                 Packaging: define.Deb,
		Local: false,
		Sudo:  true,
	})

The testing framework could then look at the Packaging: define.Deb and determine automatically that this needs to be on an OS that supports deb's. Each OS can say what it supports in its definition. Then when running the framework with MATRIX="true" it could run on Ubuntu, Debian, etc. By default it would just pick Ubuntu. Doing it this way would also allow only the deb to be copied to the host. Currently with this change, it is going to copy both the targz and deb to the host, when both are not needed.

@leehinman leehinman force-pushed the 4086_integration_deb_rpm branch from a2af0fd to bce8d27 Compare March 7, 2024 17:13
Copy link

Quality Gate passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No Coverage information No data about Coverage
0.0% 0.0% Duplication on New Code

See analysis details on SonarQube

@leehinman
Copy link
Contributor Author

@blakerouse @pchila @cmacknz are we "OK" to merge or anything else we have to address for this PR?

We absolutely have changes that need to happen before adding RPM.

@cmacknz
Copy link
Member

cmacknz commented Mar 8, 2024

I am fine to defer future improvements to other PRs and issues.

The only thing I noticed after approving is that #4334 was merged before this. We could add a check that the files were actually deleting after running apt-get purge as part of this test to ensure we don't regress on cleanup of the DEB package.

@leehinman leehinman merged commit 4fd31f1 into elastic:main Mar 12, 2024
9 checks passed
rdner added a commit that referenced this pull request Mar 15, 2024
rdner added a commit that referenced this pull request Mar 15, 2024
This reverts commit 4fd31f1.

The Serverless Beats Tests step in our daily builds started failing with:

> Error: error running test: missing required Elastic Agent package builds for integration runner to execute: /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-linux-x86_64.tar.gz, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-linux-x86_64.tar.gz.sha512, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-x86_64.rpm, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-x86_64.rpm.sha512, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-amd64.deb, /tmp/beats-build/beats/x-pack/metricbeat/build/distributions/elastic-agent-8.14.0-SNAPSHOT-amd64.deb.sha512
~/builds/bk-agent-prod-gcp-1710406561496364062/elastic/elastic-agent

The last successful build is https://buildkite.com/elastic/elastic-agent/builds/7707
The first failure is https://buildkite.com/elastic/elastic-agent/builds/7735

There were only 2 changes made between these, this change is one of them. Let's try to revert this first because it has the least impact.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The integration test framework should test the DEB and RPM installations of the agent
6 participants