Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V7 Add transient error handling #2281

Open
cronchie opened this issue Nov 6, 2024 · 7 comments
Open

V7 Add transient error handling #2281

cronchie opened this issue Nov 6, 2024 · 7 comments
Assignees
Labels
3) awaiting proposal There is some discussion in issue and reach to some results but it's not concluded with clear propos V7 Temporary label for grouping logging related issues _5.0 - Not blocker This issue does not block 5.0 so if it gets addressed then great, if not then fine.

Comments

@cronchie
Copy link
Contributor

cronchie commented Nov 6, 2024

Draft requirement:

Verify that transient errors are handled with a retry mechanism that includes backoff and jitter, ensuring retries are spaced effectively to prevent overwhelming the system or external service, and avoiding retries for errors that are not transient.

I propose this has an availability impact because without a retry and backoff strategy, clients may repeatedly hammer a server when transient errors (like timeouts or temporary unavailability) occur. This could overwhelm the target service or network, creating a DoS. As for the inverse-- there's no point retrying other kinds of errors (AuthN / AuthZ / 400 Bad Request) and doing so can also impact availability.

@tghosth tghosth added 1) Discussion ongoing Issue is opened and assigned but no clear proposal yet _5.0 - prep This needs to be addressed to prepare 5.0 V7 Temporary label for grouping logging related issues labels Nov 6, 2024
@ryarmst
Copy link
Collaborator

ryarmst commented Nov 6, 2024

I think this is a good mechanism, but perhaps out of scope. There are many ways an app could be designed such that a sort of self-DoS could occur and there is not an immediate vector to abuse it.

@cronchie
Copy link
Contributor Author

cronchie commented Nov 6, 2024

There are many ways an app could be designed such that a sort of self-DoS could occur and there is not an immediate vector to abuse it.

Very true! Though I suggest that this is DoSing an external system. For instance, if a web app keeps retrying an operation with no backoff time, that could impact the service's availability.

@elarlang
Copy link
Collaborator

elarlang commented Nov 7, 2024

potentially related: #1778

@tghosth
Copy link
Collaborator

tghosth commented Nov 7, 2024

@cronchie could you take a look at #1778 and see if the content there is similar to what you mention here?

@elarlang
Copy link
Collaborator

elarlang commented Nov 9, 2024

I rechecked, that this issue is not a duplicate of #1778, although related.

From the issue description:

I propose this has an availability impact because, without a retry and backoff strategy, clients may repeatedly hammer a server when transient errors (like timeouts or temporary unavailability) occur. This could overwhelm the target service or network, creating a DoS. As for the inverse-- there's no point retrying other kinds of errors (AuthN / AuthZ / 400 Bad Request) and doing so can also impact availability.

Point of view: "save 3rd party service from DoS". Based on this description I would say, that every service must defend itself and can not rely on the hope their users are nice.

There are some risks for the application itself, when hammering other services - the application may get blocked from using the service - and this is a clear denial of service for the application itself (that is in scope for ASVS), not (only) for the external service (that is out of scope for ASVS).

How to solve the problem - as it may be done and achieved in many different ways, we just can highlight the problem in a documentation requirement.

We have a documentation requirement:

1.14.7 Verify that all communication needs for the application are documented. This should include external services which the application relies upon and cases where an end user might be able to provide an external location to which the application will then connect.

We can ask "Document how the connection to service should fail, and if retry is allowed, then retry strategy."

In case there is need to create a new documentation requirement for #1778, it may fit there better.

@tghosth tghosth added the next meeting Filter for leaders label Nov 11, 2024
@tghosth
Copy link
Collaborator

tghosth commented Nov 19, 2024

Ok. I have read back through this and I think that there are two main considerations here:

  1. How errors are handled between different components of the application in scope.
  2. How the app handles an external service not being available.

I actually think that the new 7.4.4 could handle both these situations with some tailoring.

I would therefore suggest the following:

# Description L1 L2 L3 CWE
7.4.4 [ADDED] Verify that the application is designed so that a failure to access internal or external resources does not result in the entire application failing. For example, by using a gradual retry mechanism or the circuit breaker pattern.

Thoughts @cronchie @elarlang ?

@tghosth tghosth added 4) proposal for review Issue contains clear proposal for add/change something and removed 1) Discussion ongoing Issue is opened and assigned but no clear proposal yet labels Nov 19, 2024
@elarlang
Copy link
Collaborator

I don't think 7.4.4 should cover the retry issue.

Retry actually works against it - for synchronous flow, as most of the HTTP request-responses are - having retry for connection is just increasing the risk that we try to solve here.

@tghosth tghosth added _5.0 - Not blocker This issue does not block 5.0 so if it gets addressed then great, if not then fine. 3) awaiting proposal There is some discussion in issue and reach to some results but it's not concluded with clear propos and removed _5.0 - prep This needs to be addressed to prepare 5.0 next meeting Filter for leaders 4) proposal for review Issue contains clear proposal for add/change something labels Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3) awaiting proposal There is some discussion in issue and reach to some results but it's not concluded with clear propos V7 Temporary label for grouping logging related issues _5.0 - Not blocker This issue does not block 5.0 so if it gets addressed then great, if not then fine.
Projects
None yet
Development

No branches or pull requests

4 participants