Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FUN-1313] implement retry strategy for external adapter #12391

Merged
merged 11 commits into from
Mar 15, 2024

Conversation

agparadiso
Copy link
Contributor

@agparadiso agparadiso commented Mar 12, 2024

Description

This PR implements a retry strategy with exponential backoff for external adapter requests.
It will retry up to 5 times in a exponential way, and fail in case this does not fixes it.

ℹ️ It adds a dependency to the hashicorp/go-retryablehttp library.

Testing

This has been covered with several scenarios to validate the configuration of the retry strategy is correct. We will only retry on 5XX errors, given it does not make sense to retry eg on a BadRequest error.

Configuration of Retrial and ExponentialBackoff has been tested on staging with logs like this: log

Copy link
Contributor

I see that you haven't updated any README files. Would it make sense to do so?

@agparadiso agparadiso force-pushed the feat/FUN-1313_retry_req_external_adapter branch from 1a4bbbe to 5b52f2a Compare March 12, 2024 13:32
@agparadiso agparadiso marked this pull request as ready for review March 12, 2024 16:49
@agparadiso agparadiso requested review from a team as code owners March 12, 2024 16:49
@@ -56,6 +56,7 @@ const (
FunctionsBridgeName string = "ea_bridge"
FunctionsS4Namespace string = "functions"
MaxAdapterResponseBytes int64 = 1_000_000
MaxAdapterRetry int = 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be able to move this into the jobspec.

@@ -190,7 +194,10 @@ func (ea *externalAdapterClient) request(
req.Header.Set("Content-Type", "application/json")

start := time.Now()
client := &http.Client{}
retryClient := retryablehttp.NewClient()
retryClient.RetryMax = ea.maxRetry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a config value for the base delay between each retry attempt? I know it is exponential backoff, but I would think there is some base number they use to calculate the delay between each attempt. It would be nice if this value was also configurable from the jobspec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 added a exponentialBackoffBaseSec parameter in which we can specify the base delay

@agparadiso agparadiso force-pushed the feat/FUN-1313_retry_req_external_adapter branch from d136b62 to acb7580 Compare March 13, 2024 14:53
@agparadiso agparadiso requested a review from KuphJr March 13, 2024 15:22
Copy link
Contributor

@bolekk bolekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

maxResponseBytes int64
adapterURL url.URL
maxResponseBytes int64
maxRetry int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "maxRetries"

bridgeORM bridges.ORM
bridgeName string
maxResponseBytes int64
maxRetry int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -57,6 +57,8 @@ const (
FunctionsS4Namespace string = "functions"
MaxAdapterResponseBytes int64 = 1_000_000
DefaultOffchainTransmitterChannelSize uint32 = 1000
DefaultMaxAdapterRetry int = 3
DefaultExponentialBackoffBase = 1 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start with a higher default, maybe 5 seconds? Also what happens if the request (computation) takes a long time? Is retry time measured from when previous response happened?

conf.Logger.Debugf("external adapter maxRetry configured to: %d", maxRetry)
} else {
maxRetry = DefaultMaxAdapterRetry
conf.Logger.Debugf("external adapter maxRetry configured to default: %d", maxRetry)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just have a single log message below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 true! I wanted to make a clear difference between picking up the value from the jobspec vs using the default value, but if we are logging the actual value it should be enought!

@agparadiso agparadiso requested a review from bolekk March 14, 2024 11:49
bolekk
bolekk previously approved these changes Mar 15, 2024
conf.Logger.Debugf("external adapter exponentialBackoffBase configured to: %g sec", exponentialBackoffBase.Seconds())
} else {
exponentialBackoffBase = DefaultExponentialBackoffBase
conf.Logger.Debugf("external adapter exponentialBackoffBase configured to default: %g sec", exponentialBackoffBase.Seconds())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make a corresponding change to these logs, for consistency.

@agparadiso agparadiso added this pull request to the merge queue Mar 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 15, 2024
@agparadiso agparadiso added this pull request to the merge queue Mar 15, 2024
Merged via the queue into develop with commit d0a88b8 Mar 15, 2024
97 of 98 checks passed
@agparadiso agparadiso deleted the feat/FUN-1313_retry_req_external_adapter branch March 15, 2024 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants