Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector/datadog] Fix leaking goroutines and enable goleak #30500

Closed

Conversation

crobert-1
Copy link
Member

Description:

goleak detected some leaking goroutines in this package. Some are due to issues we've already hit, some are actual leaks in the collector. The added Stop calls for the TraceWriter and StatsWriter were required to make sure we don't leak goroutines on shutdown.

Link to tracking Issue:
#30438

Testing:
goleak test is passing. Calling the Shutdown methods in the test now take quite a while due to how many goroutines are creating when the TracesAgent is created.

Documentation:
Help on the changelog message is appreciated, not sure what details would make sense to users and what wouldn't.

@crobert-1
Copy link
Member Author

crobert-1 commented Jan 13, 2024

Looks like tests are failing due to timeouts from Shutdown taking too long.

The test output shows leaked goroutines, but I believe this is simply Shutdown not completing before it timed out.

goleak output:

goroutine 7 [chan receive]:
github.com/DataDog/datadog-agent/pkg/trace/writer.(*sender).loop(0xc000494fc0)
	/home/runner/go/pkg/mod/github.com/!data!dog/datadog-agent/pkg/[email protected]/writer/sender.go:170 +0x8c
created by github.com/DataDog/datadog-agent/pkg/trace/writer.newSender in goroutine 3
	/home/runner/go/pkg/mod/github.com/!data!dog/datadog-agent/pkg/[email protected]/writer/sender.go:163 +0x1a5

The senders are properly stopped by the datadog-agent TraceWriter.Stop() method.

It looks like datadogconnector.CreateTracesToMetrics creates 120 sender goroutines in its default configuration. NewTraceWriter creates another 16 goroutines (on my local machine). I'll test on my machine to see how long this ends up taking to complete (without a timeout).

@crobert-1
Copy link
Member Author

Did some more investigation, this is blocked by #30487 as explained here.

@crobert-1 crobert-1 closed this Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant