-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERR README.md #3474
base: main
Are you sure you want to change the base?
ERR README.md #3474
Conversation
Pull Request Test Coverage Report for Build 11883975114Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
This is to define a new Test for ERR (Extended Route retention). This test demonstrates a new behavior for Graceful restart which is an extension to the existing behavior explained in RFC4724 and RFC8538. This test also relies on, - Changes proposed to the gNOI.bgp proto in openconfig/gnoi#214 - Also relies on some OC paths for ERR which arent available yet.
Formatting changes
Updated the Yaml data
|
||
## Summary | ||
|
||
This is an extension to the RFC8538 tests already conducted under "RT-1.4: BGP Graceful Restart". However, ERR is for projects that need to extend the validity of a route beyond the expiration of the stale routes timer for the BGP GR process. Following are the scenarios when ERR can be considered by a project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does project mean here? I think it just means a BGP speaker right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected!
Since the route retention is purely local action of the receiving speaker, this action should not require any additional capabilities advertisements beyond capability 64 (Graceful Restart), and should not be confused with or require capability 71 (Long-Lived Graceful Restart) from the sending speaker. | ||
|
||
**How is this different from LLGR as tested in RT-1.14?** | ||
As per the [IETF Draft on LLGR](https://tools.ietf.org/html/draft-ietf-idr-long-lived-gr), we have the following that is different from EER. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a useful section, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack!
B <-- EBGP(ASN200) --> C[Port2:ATE]; | ||
``` | ||
* ATE:Port1 runs IBGP and must advertise the following IPv4 and IPv6 prefixes with the corresponding commiunitty attributes | ||
* IPv4Prefix1 and IPv6Prefix1 with community NO-ERR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter what these community values are? I assume not?
Do they need to just be standard communities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they aren't. The use case seems to be for Private communities.
* DUT has the following configuration on its IBGP and EBGP peering | ||
* Extended route retention (ERR) enabled. | ||
* ERR configuration has the retention time of 300 secs configured | ||
* ERR has a retention-policy `STALE-ROUTE-POLICY` attached. The DUT **MUST** apply this policy on post-policy adj-rib-in routes. Therefore, import export policy applied to the routes must not be overridden by this policy, it MUST instead be additive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you define somewhere what the retention policy is defined to be and how it applies to routes? Is it applied by default to all sessions?
I think you are saying that there's a specific order of operations for these policies -- and they are NOT part of the normal import/export policy chain. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a section under Procedure explaining how the ERR policy differentiates from the regular import/export policy. However, I now consolidated and added more details under the summary section w/ a new subsection for ERR policy. PTAL
* identify routes tagged with community `NO-ERR` and have an action of "REJECT" so such routes aren't considered for ERR but only GR | ||
* identify routes tagged with community `ERR-NO-DEPREF` and have an action of "ACCEPT" so such routes are considered for ERR. Also ADD community `STALE` to the existing community list attached as part of the regular adj-rib-in post policy for the route. | ||
* Catch-all rule to identify and accept all other prefixes, attach a local-preference of "0" and ADD community `STALE` to the existing community list. | ||
* DUT has import-policy importibgp and export-policy exportibgp towards the IBGP neighbor applied in the import and export directions respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that it might be easier to just define these policies in OC ;-), since you're basically using vendor-neutral text to be able to define what you need. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it already defined in the Yaml section at the bottom. However, I revisited that to look for missing ones.
* Test Flows used for verification | ||
* IPv4Prefix1 <-> IPv4Prefix4, IPv6Prefix1 <-> IPv6Prefix4 | ||
* IPv4Prefix2 <-> IPv4Prefix5, IPv6Prefix2 <-> IPv6Prefix5 | ||
* IPv4Prefix3 <-> IPv4Prefix6, IPv6Prefix3 <-> IPv6Prefix6<br><br><br> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove all <br>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, removed all
|
||
```TODO: gNOI.ClearBGPNeighborRequest_GRACEFUL used in this case is under review in https://github.com/openconfig/gnoi/pull/214``` | ||
|
||
***`a. Expected behavior when "Administrative Reset" Notification (rfc4486) sent to the peer while the ERR policy is attached to the neighborship.`*** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this scenario -- who is implementing ERR? It seems like not the DUT? Are we expecting that the DUT does ERR even when it itself triggers the restart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Intent of sending code 6 subcode 4 while peers have exchanged the "N" bit in the RESET Flag of Capability 64, is for both peers to reset the TCP session but not flush routes until the configured Retention period. They will add the other attributes matching the ACCEPT action of the ERR policy for routes learnt from each other and advertise accordingly to their respective peers. Tl;dr: Both on either side of the disrupted TCP will need to practice ERR for seamless operation of the impacted routes as expected.
***`a. Expected behavior when "Administrative Reset" Notification (rfc4486) sent to the peer while the ERR policy is attached to the neighborship.`*** | ||
* Start traffic as per the flows above | ||
* Trigger BGP Notification (code 6 subocde 4) from DUT:Port1 towards ATE:Port1. Please use the `gNOI.ClearBGPNeighborRequest_GRACEFUL` message. | ||
* Cease notification of Code 6, subcode 4 will result in tcp connection reset but the routes aren't flushed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the "routes aren't flushed" applicable to the ATE session?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, ERR is an extension to RFC8538 which is again an extension to RFC4724. While the last introduced GR, it excluded NOTIFICATION messages. Hence implementations supporting RFC4724 will flush routes for all kinds of NOTIFICATION messages. RFC8538 is an update to include NOTIFICATION messages as well part of GR. However, for RFC8538 it became necessary to introduce a new NOTIFICATION message called "HARD RESET" to indeed allow for flushing of routes voluntarily (more in https://datatracker.ietf.org/doc/html/rfc8538#section-5.1). Hence, not all messages cause Route flush. While "Administrative RESET" can cause TCP termination, voluntary route flush is possible w/ HARD_RESET (Code 6 subcode 9).
Hope this explains and differentiates the case this Sub test case is testing from actual Route flush scenarios demonstrated in other sub tests.
```TODO: gNOI.ClearBGPNeighborRequest_GRACEFUL used in this case is under review in https://github.com/openconfig/gnoi/pull/214``` | ||
|
||
***`a. Expected behavior when "Administrative Reset" Notification (rfc4486) received from the peer while ERR policy is attached on the neighborship.`*** | ||
* Follow the same procedure as RT-1.35.6.a above. However this time, Trigger BGP Notification (code 6 subocde 4) from ATE:Port1 towards DUT:Port1. Please use the `gNOI.ClearBGPNeighborRequest_GRACEFUL` message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same questions as above w.r.t who is triggering the session reset. In all the cases I am reading that the DUT is doing a reset because of the reference to gNOI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So AIUI, if an implementation sends Administrative reset, it knows it sent Code6 subcode 4 and both the peers have negotiated "N" bit already in the RESET flag of the Capability 64 message. Therefore, while the TCP is reset, both do not flush the routes learnt from each other. Now based on the ERR policy, some routes (w/ Accept action) will experience extended retention while some routes (w/ Reject action) will be held only until the RESTART timer expires. Beyond the RESTART timer, if "STALE TIMER" timer is configured (https://datatracker.ietf.org/doc/html/rfc8538#section-4.1), the routes will stay post which they are flushed.
Updated to 80 Characters
Added ERR policy details under summary
Updated the Yaml definition to include OC paths for config and STATE
Adding ":" to paths defined under Yaml
Update Yaml file to run the checks successfully. Path for "config/reject-route" had duplicates.
Some more corrections to Yaml paths and rpc
More corrections to the YAML section
Formatting
Added Yaml details