Add network health check to both worker and data fetcher services #368

vasyl-ivanchuk · 2025-01-06T16:29:46Z

🐛 Bug Report

📝 Description

When a data fetcher pod loses its connection to the network, it is still considered healthy and continues receiving traffic. As a result, it keeps retrying network requests until the timeout is reached. Example logs in such a case:

{"code":"Unknown system error -116","context":"BlockchainService","level":"error","message":"getaddrinfo Unknown system error -116 <BLOCKCHAIN URL>","ms":"+30s","stack":["Error: getaddrinfo Unknown system error -116 <BLOCKCHAIN URL>
at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:108:26)"],"timestamp":"2025-01-06T09:05:25.035Z"}
{"context":"BlockchainService","functionName":"getBlockDetails","level":"error","message":"Exceeded retries total timeout, failing the request","ms":"+0ms","stack":[null],"timestamp":"2025-01-06T09:05:25.035Z"}

Same problem can happen with the worker service.

🤔 Expected Behavior

The health check for the pod should fail, so the pod is considered unhealthy and gets replaced.

😯 Current Behavior

The pod is considered healthy and keeps receiving traffic.

📋 Additional Context

Suggested solution: Both the worker and data fetcher services already have a JsonRpcHealthIndicator. I suggest customizing this indicator so that it pings the blockchain at a configured interval (e.g., every 20 seconds) with a large timeout (e.g., 10 seconds) and updates an internal state variable. Then, when the isHealthy function is called, the value of the internal state is returned. The motivation behind this approach is to avoid spamming the network too frequently, which could be harmful when the network is under heavy load.

The text was updated successfully, but these errors were encountered:

vasyl-ivanchuk added bug Something isn't working backend Task requires changes to the backend implementation labels Jan 6, 2025

vasyl-ivanchuk mentioned this issue Jan 17, 2025

fix: health checks for worker and fetcher #374

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add network health check to both worker and data fetcher services #368

Add network health check to both worker and data fetcher services #368

vasyl-ivanchuk commented Jan 6, 2025

Add network health check to both worker and data fetcher services #368

Add network health check to both worker and data fetcher services #368

Comments

vasyl-ivanchuk commented Jan 6, 2025

🐛 Bug Report

📝 Description

🤔 Expected Behavior

😯 Current Behavior

📋 Additional Context