-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary debug/error logged during idle connection teardown #40824
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
This error does not happen only in this situation, it can happen with a legit interrupted connection by either side. I don't think it's a good idea to fully conceal this error. It would make it difficult to find some networking issues (proxy?) on the customer's side. So, we cannot simply change https://github.com/elastic/elastic-agent-libs/blob/01275338dc278335b4a8c0f23055014b4c0702dc/transport/logging.go#L48-L54 to ignore the error. If we want to address this issue right, we need to find the way to handle this error in those "keep alive session maintenance" requests instead. The error is simply returned from the |
I think we should find the places where those "keep alive requests" happen and send them without the logger connection wrapper at all. We should not care about their success generally. For example, we could add a new function |
The keepalives are most probably on the connections to the Elasticsearch _bulk endpoint in the Beats. That is pretty much the only network connection they regularly make. The idle connection timeout is managed within the net/http implementation: https://cs.opensource.google/go/go/+/master:src/net/http/transport.go;drc=77e42fdeaf98d241dc09f4eb92e09225aea8f4c3;l=1090 |
@strawgate sounds like you spent some time debugging it, could you give us some pointers? Where did you see this HTTP request and how did you read the buffer? |
If you apply the breakpoint at https://github.com/elastic/elastic-agent-libs/blob/4babafd5ed1e5079acf74212ed3da01740b22de7/transport/logging.go#L50 and then wait for it to hit, do one step out, you'll land in net/http/transport.go at Read and you can read the buffer from the vscode debugger under Example: |
When I first looked at this I wondered if we could catch this by not logging when we get an error but also a non-zero func (l *loggingConn) Read(b []byte) (int, error) {
n, err := l.Conn.Read(b)
if err != nil && !errors.Is(err, io.EOF) {
l.logger.Debugf("Error reading from connection: %v", err)
}
return n, err
} |
Hello @strawgate, can you list the steps to reproduce this error. This was required to test the fix, thanks |
I believe all you need to do is start a relatively new metricbeat (i was using |
I tried this locally, but it still throws an error. The buffer may or may not be empty when
The logger wrapper is attached to connections made to ES. https://github.com/khushijain21/beats/blob/main/libbeat/esleg/eslegclient/connection.go#L164. I think we may risk losing "all" connection details if we remove it Any suggestion is appreciated. |
Just to summarize my thoughts on this issue:
|
After looking into internals, @khushijain21 and I decided to just remove the log line and let the consumer of the |
When an idle connection is torn down by the beat,
Error reading from connection: read tcp y.y.y.y:54668->x.x.x.x:443: use of closed network connection
is logged bygithub.com/elastic/elastic-agent-libs/transport.(*loggingConn).Read"}
Here: https://github.com/elastic/elastic-agent-libs/blob/01275338dc278335b4a8c0f23055014b4c0702dc/transport/logging.go#L48-L54
This appears to be called during the keep alive session maintenance every time data is read from the buffer? The last data in the buffer is the 200 OK from the server so we are receiving the full response.
When the connection is torn down, this receives an errNetClosing instead of an EOF (likely because we are the ones closing the connection and not the server. This code path only checks for EOF and thus we get the
use of closed network connection
err debug logged during the read.This error message “pollutes” our debug logs and leads customers to believing there is a network issue.
The text was updated successfully, but these errors were encountered: