-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FUN-1313] implement retry strategy for external adapter #12391
[FUN-1313] implement retry strategy for external adapter #12391
Conversation
I see that you haven't updated any README files. Would it make sense to do so? |
1a4bbbe
to
5b52f2a
Compare
@@ -56,6 +56,7 @@ const ( | |||
FunctionsBridgeName string = "ea_bridge" | |||
FunctionsS4Namespace string = "functions" | |||
MaxAdapterResponseBytes int64 = 1_000_000 | |||
MaxAdapterRetry int = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may be able to move this into the jobspec.
@@ -190,7 +194,10 @@ func (ea *externalAdapterClient) request( | |||
req.Header.Set("Content-Type", "application/json") | |||
|
|||
start := time.Now() | |||
client := &http.Client{} | |||
retryClient := retryablehttp.NewClient() | |||
retryClient.RetryMax = ea.maxRetry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a config value for the base delay between each retry attempt? I know it is exponential backoff, but I would think there is some base number they use to calculate the delay between each attempt. It would be nice if this value was also configurable from the jobspec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 added a exponentialBackoffBaseSec
parameter in which we can specify the base delay
d136b62
to
acb7580
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
maxResponseBytes int64 | ||
adapterURL url.URL | ||
maxResponseBytes int64 | ||
maxRetry int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "maxRetries"
bridgeORM bridges.ORM | ||
bridgeName string | ||
maxResponseBytes int64 | ||
maxRetry int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
@@ -57,6 +57,8 @@ const ( | |||
FunctionsS4Namespace string = "functions" | |||
MaxAdapterResponseBytes int64 = 1_000_000 | |||
DefaultOffchainTransmitterChannelSize uint32 = 1000 | |||
DefaultMaxAdapterRetry int = 3 | |||
DefaultExponentialBackoffBase = 1 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd start with a higher default, maybe 5 seconds? Also what happens if the request (computation) takes a long time? Is retry time measured from when previous response happened?
conf.Logger.Debugf("external adapter maxRetry configured to: %d", maxRetry) | ||
} else { | ||
maxRetry = DefaultMaxAdapterRetry | ||
conf.Logger.Debugf("external adapter maxRetry configured to default: %d", maxRetry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just have a single log message below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 true! I wanted to make a clear difference between picking up the value from the jobspec vs using the default value, but if we are logging the actual value it should be enought!
conf.Logger.Debugf("external adapter exponentialBackoffBase configured to: %g sec", exponentialBackoffBase.Seconds()) | ||
} else { | ||
exponentialBackoffBase = DefaultExponentialBackoffBase | ||
conf.Logger.Debugf("external adapter exponentialBackoffBase configured to default: %g sec", exponentialBackoffBase.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can make a corresponding change to these logs, for consistency.
Quality Gate passedIssues Measures |
Description
This PR implements a retry strategy with exponential backoff for external adapter requests.
It will retry up to 5 times in a exponential way, and fail in case this does not fixes it.
ℹ️ It adds a dependency to the
hashicorp/go-retryablehttp
library.Testing
This has been covered with several scenarios to validate the configuration of the retry strategy is correct. We will only retry on 5XX errors, given it does not make sense to retry eg on a BadRequest error.
Configuration of Retrial and ExponentialBackoff has been tested on staging with logs like this: log