You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a data fetcher pod loses its connection to the network, it is still considered healthy and continues receiving traffic. As a result, it keeps retrying network requests until the timeout is reached. Example logs in such a case:
{"code":"Unknown system error -116","context":"BlockchainService","level":"error","message":"getaddrinfo Unknown system error -116 <BLOCKCHAIN URL>","ms":"+30s","stack":["Error: getaddrinfo Unknown system error -116 <BLOCKCHAIN URL>
at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:108:26)"],"timestamp":"2025-01-06T09:05:25.035Z"}
{"context":"BlockchainService","functionName":"getBlockDetails","level":"error","message":"Exceeded retries total timeout, failing the request","ms":"+0ms","stack":[null],"timestamp":"2025-01-06T09:05:25.035Z"}
Same problem can happen with the worker service.
🤔 Expected Behavior
The health check for the pod should fail, so the pod is considered unhealthy and gets replaced.
😯 Current Behavior
The pod is considered healthy and keeps receiving traffic.
📋 Additional Context
Suggested solution: Both the worker and data fetcher services already have a JsonRpcHealthIndicator. I suggest customizing this indicator so that it pings the blockchain at a configured interval (e.g., every 20 seconds) with a large timeout (e.g., 10 seconds) and updates an internal state variable. Then, when the isHealthy function is called, the value of the internal state is returned. The motivation behind this approach is to avoid spamming the network too frequently, which could be harmful when the network is under heavy load.
The text was updated successfully, but these errors were encountered:
🐛 Bug Report
📝 Description
When a data fetcher pod loses its connection to the network, it is still considered healthy and continues receiving traffic. As a result, it keeps retrying network requests until the timeout is reached. Example logs in such a case:
Same problem can happen with the worker service.
🤔 Expected Behavior
The health check for the pod should fail, so the pod is considered unhealthy and gets replaced.
😯 Current Behavior
The pod is considered healthy and keeps receiving traffic.
📋 Additional Context
Suggested solution: Both the worker and data fetcher services already have a
JsonRpcHealthIndicator
. I suggest customizing this indicator so that it pings the blockchain at a configured interval (e.g., every 20 seconds) with a large timeout (e.g., 10 seconds) and updates an internal state variable. Then, when theisHealthy
function is called, the value of the internal state is returned. The motivation behind this approach is to avoid spamming the network too frequently, which could be harmful when the network is under heavy load.The text was updated successfully, but these errors were encountered: