Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test]: TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped – failed to evaluate all symlinks #4929

Closed
rdner opened this issue Jun 13, 2024 · 6 comments
Assignees
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team

Comments

@rdner
Copy link
Member

rdner commented Jun 13, 2024

Failing test case

TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped

Error message

failed to evaluate all symlinks

Build

https://buildkite.com/elastic/elastic-agent-extended-testing/builds/625#0190110f-8604-4125-9789-621c5241ef2b

OS

Linux

Stacktrace and notes

=== RUN   TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped
    logs_ingestion_test.go:489: Making sure metricbeat logs are populated
    logs_ingestion_test.go:493: metricbeat: Got 300 documents
    logs_ingestion_test.go:498: Making sure all components are healthy
    logs_ingestion_test.go:500: 
        	Error Trace:	/home/rhel/agent/testing/integration/logs_ingestion_test.go:500
        	            				/home/rhel/agent/testing/integration/logs_ingestion_test.go:241
        	Error:      	Received unexpected error:
        	            	could not unmarshal agent status output: error: error creating cmd: failed to get control protcol address: failed to evaluate all symlinks of /tmp/TestRpmLogIngestFleetManaged3099371811/001/elastic-agent-8.15.0-SNAPSHOT-x86_64: lstat /tmp/TestRpmLogIngestFleetManaged3099371811/001/elastic-agent-8.15.0-SNAPSHOT-x86_64: no such file or directory, output: 
        	            	unexpected end of JSON input
        	Test:       	TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped
        	Messages:   	could not get agent status to verify all components are healthy
--- FAIL: TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped (15.30s)
@rdner rdner added Team:Elastic-Agent Label for the Agent team flaky-test Unstable or unreliable test cases. labels Jun 13, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@rdner rdner changed the title [Flaky Test]: TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped – <short_error_message> [Flaky Test]: TestRpmLogIngestFleetManaged/Monitoring_logs_are_shipped – failed to evaluate all symlinks Jun 13, 2024
@cmacknz
Copy link
Member

cmacknz commented Jun 13, 2024

The failing line is:

cAddr, err := control.AddressFromPath(f.operatingSystem, f.workDir)
if err != nil {
return nil, fmt.Errorf("failed to get control protcol address: %w", err)
}
agentClient := client.New(client.WithAddress(cAddr))

failed to evaluate all symlinks of /tmp/TestRpmLogIngestFleetManaged3099371811/001/elastic-agent-8.15.0-SNAPSHOT-x86_64: no such file or directory

The DEB and RPM tests are the ones that are failing, and this makes some sense, because they do not use any paths under /tmp and that won't be the path to the agent command. For those packages the symlink ends up under /var/lib.

/var/lib/{{.BeatName}}/data/{{.BeatName}}-{{agent_package_version}}{{snapshot_suffix}}-{{ commit_short }}/{{.BeatName}}{{.BinaryExt}}:

I think the DEB and RPM install commands are missing setting up a client at the right socket path. I am actually surprised this failure doesn't happen every time more than anything.

Here is the client getting created on the regular install path:

// we just installed agent, the control socket is at a well-known location
socketPath := fmt.Sprintf("unix://%s", paths.ControlSocketRunSymlink) // use symlink as that works for all versions
if runtime.GOOS == "windows" {
// Windows uses a fixed named pipe, that is always the same.
// It is the same even running in unprivileged mode.
socketPath = paths.WindowsControlSocketInstalledPath
} else if !installOpts.Privileged {
// Unprivileged versions move the socket to inside the installed directory
// of the Elastic Agent.
socketPath = paths.ControlSocketFromPath(runtime.GOOS, f.workDir)
}
c := client.New(client.WithAddress(socketPath))
f.setClient(c)

Here is the DEB install which does not set this up, so we fall back to trying to find the control socket in work dir, which is a temporary directory.

enrollArgs := []string{"elastic-agent", "enroll"}

The DEB and RPM install are also missing the clean up calls to get diagnostics and dump processes.

f.t.Cleanup(func() {
// check for running agents after uninstall had a chance to run
assert.Empty(f.t, getElasticAgentProcesses(f.t), "there should be no running agent at the end of the test")
})

@ycombinator
Copy link
Contributor

@rdner
Copy link
Member Author

rdner commented Aug 19, 2024

@ycombinator the failure you linked seems to be new/unrelated, see #5311

@VihasMakwana
Copy link
Contributor

This test is passing now, thanks to #4938.
Offending code.
Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

5 participants