Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: do not retry for 4xx errors #950

Merged
merged 4 commits into from
Sep 15, 2024
Merged

Conversation

TomerHeber
Copy link
Collaborator

Issue & Steps to Reproduce / Feature Request

fixes #949

Solution

Removed the default error handler.

Copy link
Contributor

@RLRabinowitz RLRabinowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this please?

@TomerHeber
Copy link
Collaborator Author

TomerHeber commented Sep 8, 2024

Can we add a test for this please?

@RLRabinowitz

I have checked... right now there are no tests for the provider http client.
Adding these is an effort, but I'm not not sure we would gain anything from it.

There are these tests: client/http/client_test.go - but they don't really test the provider client... (which is created in providor.go).

I guess the question is: what are we trying to test?

we could test this specific function, by passing responses with different status codes:

			AddRetryCondition(func(r *resty.Response, err error) bool {
				if r == nil {
					// No response. Possibly a networking issue (E.g. DNS lookup failure).
					tflog.SubsystemWarn(subCtx, "env0_api_client", "No response, retrying request")
					return true
				}

				// When running integration tests 404 may occur due to "database eventual consistency".
				// Retry when there's a 5xx error. Otherwise do not retry.
				if r.StatusCode() >= 500 || (isIntegrationTest && r.StatusCode() == 404) {
					tflog.SubsystemWarn(subCtx, "env0_api_client", "Received a failed or not found response, retrying request", map[string]interface{}{"method": r.Request.Method, "url": r.Request.URL, "status code": r.StatusCode()})
					return true
				}

				if r.StatusCode() == 200 && isIntegrationTest && r.String() == "[]" {
					tflog.SubsystemWarn(subCtx, "env0_api_client", "Received an empty list , retrying request", map[string]interface{}{"method": r.Request.Method, "url": r.Request.URL})
					return true
				}

				return false
			})

move this code to a dedicated, function... then write unit tests that each test case passes a different response object to this function.

Let me know if it's something you would like to me to do.

@avnerenv0
Copy link
Contributor

Didn't we add this to make the integration tests more stable?
Like we had issues with weak consistency and we needed the retries for them to pass?

@TomerHeber
Copy link
Collaborator Author

Didn't we add this to make the integration tests more stable? Like we had issues with weak consistency and we needed the retries for them to pass?

@avnerenv0

We did...
The question is: if you expect customers to run into the same eventual consistency issues?
If the answer is 'yes', I may want to make some additional (minor) modifications.

@TomerHeber
Copy link
Collaborator Author

Didn't we add this to make the integration tests more stable? Like we had issues with weak consistency and we needed the retries for them to pass?

@avnerenv0

We did... The question is: if you expect customers to run into the same eventual consistency issues? If the answer is 'yes', I may want to make some additional (minor) modifications.

I removed the line that would retry for any error... and kept the one that retries only on specific use-cases.
But if 404 'false-positive' is something customers may encounter as well... I would like to expend it not just for integration tests.

@avnerenv0
Copy link
Contributor

no that's ok.
The integration tests sometimes create a resource and try to read it in the same file, that's not a pattern we need to support in real life.

@RLRabinowitz I'm good with this PR but I saw you also commented, so please approve if you are fine with the response.

@RLRabinowitz
Copy link
Contributor

I guess the question is: what are we trying to test?

Well, we want to make sure that we are not retrying any 4xx HTTP errors, right? Make sure the scenario mentioned in the issue will not happen again.
In this specific example - if a deploy fails with 4xx, we shouldn't retry it

Adding a UT on the retry condition doesn't really help, as the fix was unrelated to that code.

Ideally we'd have a test that actually makes sure that this doesn't happen again, as right now we have no guardrails saving us from this happening again

Couldn't such a test simply be added to client_test.go?

@TomerHeber
Copy link
Collaborator Author

I guess the question is: what are we trying to test?

Well, we want to make sure that we are not retrying any 4xx HTTP errors, right? Make sure the scenario mentioned in the issue will not happen again. In this specific example - if a deploy fails with 4xx, we shouldn't retry it

Adding a UT on the retry condition doesn't really help, as the fix was unrelated to that code.

Ideally we'd have a test that actually makes sure that this doesn't happen again, as right now we have no guardrails saving us from this happening again

Couldn't such a test simply be added to client_test.go?

Ok sure.
I'll have to modify client_test.go since it tests an empty resty instance and not the one used by the provider.
This will require some minor refactoring but nothing too big. (I hope).


return func(ctx context.Context, d *schema.ResourceData) (interface{}, diag.Diagnostics) {
restyClient := resty.New()
func createRestyClient(ctx context.Context) *resty.Client {
Copy link
Collaborator Author

@TomerHeber TomerHeber Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved it to a function - this allows me to test almost the exact same resty.client that will be used in the provider.

@TomerHeber
Copy link
Collaborator Author

I guess the question is: what are we trying to test?

Well, we want to make sure that we are not retrying any 4xx HTTP errors, right? Make sure the scenario mentioned in the issue will not happen again. In this specific example - if a deploy fails with 4xx, we shouldn't retry it
Adding a UT on the retry condition doesn't really help, as the fix was unrelated to that code.
Ideally we'd have a test that actually makes sure that this doesn't happen again, as right now we have no guardrails saving us from this happening again
Couldn't such a test simply be added to client_test.go?

Ok sure. I'll have to modify client_test.go since it tests an empty resty instance and not the one used by the provider. This will require some minor refactoring but nothing too big. (I hope).

@RLRabinowitz I have added unit tests as requested.

@github-actions github-actions bot added ready to merge PR approved - can be merged once the PR owner is ready and removed pending final review labels Sep 15, 2024
@TomerHeber TomerHeber merged commit 8254da0 into main Sep 15, 2024
5 checks passed
@TomerHeber TomerHeber deleted the fix-remove-retry-for-4xx-#949 branch September 15, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix provider ready to merge PR approved - can be merged once the PR owner is ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extra api call retries for 422
3 participants