You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Storage nodes have the possibility of limiting the number of in-flight requests they can handle at once using the STORJ_STORAGE2_MAX_CONCURRENT_REQUESTS config (storage2.max-concurrent-requests).
gateway-mt has a macaroon limiter to limit concurrent requests. This limit is staticly set, but we've seen a need to dynamically adjust this limit as storage network conditions change. One idea is we adjust it based on errors coming back from storage nodes.
We could take inspiration from a TCP congestion control algorithm like AIMD. Apply a multiplicative decrease (to a defined minimum) on macaroon limits when we start seeing more nodes return a limit error. Increase it (to a defined max limit) as nodes no longer return the error. Perhaps we have a timer where we watch the number of limit errors in a given window in order to decide if limits should be adjusted.
To consider:
Should we handle stall detection logic in the gateway? https://review.dev.storj.io/c/storj/uplink/+/15489 introduces a stall manager to uplink in response to other cases where storage nodes are stalling on uploads, but maybe it makes sense for this logic to move closer to congestion control itself, like decrease limits if we start seeing stalling
Keep track of round-trip times (on the individual storage node piece operation level?) as another signal for congestion?
The text was updated successfully, but these errors were encountered:
halkyon
changed the title
Dymamically adjust macaroon limiter based on errors from storage nodes
[draft] Dymamically adjust macaroon limiter based on errors from storage nodes
Dec 16, 2024
halkyon
changed the title
[draft] Dymamically adjust macaroon limiter based on errors from storage nodes
[draft] Dynamically adjust macaroon limiter based on errors from storage nodes
Dec 16, 2024
Storage nodes have the possibility of limiting the number of in-flight requests they can handle at once using the
STORJ_STORAGE2_MAX_CONCURRENT_REQUESTS
config (storage2.max-concurrent-requests
).gateway-mt has a macaroon limiter to limit concurrent requests. This limit is staticly set, but we've seen a need to dynamically adjust this limit as storage network conditions change. One idea is we adjust it based on errors coming back from storage nodes.
We could take inspiration from a TCP congestion control algorithm like AIMD. Apply a multiplicative decrease (to a defined minimum) on macaroon limits when we start seeing more nodes return a limit error. Increase it (to a defined max limit) as nodes no longer return the error. Perhaps we have a timer where we watch the number of limit errors in a given window in order to decide if limits should be adjusted.
To consider:
References:
The text was updated successfully, but these errors were encountered: