Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR README.md #3474

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

ERR README.md #3474

wants to merge 18 commits into from

Conversation

sachendras
Copy link
Contributor

@sachendras sachendras commented Sep 28, 2024

Added Yaml details

@OpenConfigBot
Copy link

OpenConfigBot commented Sep 28, 2024

Pull Request Functional Test Report for #3474 / 1c97d6d

No tests identified for validation.

Help

@coveralls
Copy link

coveralls commented Sep 28, 2024

Pull Request Test Coverage Report for Build 11883975114

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 55.268%

Totals Coverage Status
Change from base Build 11509443560: 0.0%
Covered Lines: 1983
Relevant Lines: 3588

💛 - Coveralls

This is to define a new Test for ERR (Extended Route retention). This test demonstrates a new behavior for Graceful restart which is an extension to the existing behavior explained in RFC4724 and RFC8538.

This test also relies on,

- Changes proposed to the gNOI.bgp proto in openconfig/gnoi#214
- Also relies on some OC paths for ERR which arent available yet.
Formatting changes
Updated the Yaml data
@sachendras sachendras marked this pull request as ready for review September 29, 2024 05:00
@sachendras sachendras requested a review from dplore as a code owner September 29, 2024 05:00

## Summary

This is an extension to the RFC8538 tests already conducted under "RT-1.4: BGP Graceful Restart". However, ERR is for projects that need to extend the validity of a route beyond the expiration of the stale routes timer for the BGP GR process. Following are the scenarios when ERR can be considered by a project.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does project mean here? I think it just means a BGP speaker right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected!

Since the route retention is purely local action of the receiving speaker, this action should not require any additional capabilities advertisements beyond capability 64 (Graceful Restart), and should not be confused with or require capability 71 (Long-Lived Graceful Restart) from the sending speaker.

**How is this different from LLGR as tested in RT-1.14?**
As per the [IETF Draft on LLGR](https://tools.ietf.org/html/draft-ietf-idr-long-lived-gr), we have the following that is different from EER.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a useful section, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack!

B <-- EBGP(ASN200) --> C[Port2:ATE];
```
* ATE:Port1 runs IBGP and must advertise the following IPv4 and IPv6 prefixes with the corresponding commiunitty attributes
* IPv4Prefix1 and IPv6Prefix1 with community NO-ERR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter what these community values are? I assume not?

Do they need to just be standard communities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No they aren't. The use case seems to be for Private communities.

* DUT has the following configuration on its IBGP and EBGP peering
* Extended route retention (ERR) enabled.
* ERR configuration has the retention time of 300 secs configured
* ERR has a retention-policy `STALE-ROUTE-POLICY` attached. The DUT **MUST** apply this policy on post-policy adj-rib-in routes. Therefore, import export policy applied to the routes must not be overridden by this policy, it MUST instead be additive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define somewhere what the retention policy is defined to be and how it applies to routes? Is it applied by default to all sessions?

I think you are saying that there's a specific order of operations for these policies -- and they are NOT part of the normal import/export policy chain. Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a section under Procedure explaining how the ERR policy differentiates from the regular import/export policy. However, I now consolidated and added more details under the summary section w/ a new subsection for ERR policy. PTAL

* identify routes tagged with community `NO-ERR` and have an action of "REJECT" so such routes aren't considered for ERR but only GR
* identify routes tagged with community `ERR-NO-DEPREF` and have an action of "ACCEPT" so such routes are considered for ERR. Also ADD community `STALE` to the existing community list attached as part of the regular adj-rib-in post policy for the route.
* Catch-all rule to identify and accept all other prefixes, attach a local-preference of "0" and ADD community `STALE` to the existing community list.
* DUT has import-policy importibgp and export-policy exportibgp towards the IBGP neighbor applied in the import and export directions respectively.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that it might be easier to just define these policies in OC ;-), since you're basically using vendor-neutral text to be able to define what you need. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it already defined in the Yaml section at the bottom. However, I revisited that to look for missing ones.

* Test Flows used for verification
* IPv4Prefix1 <-> IPv4Prefix4, IPv6Prefix1 <-> IPv6Prefix4
* IPv4Prefix2 <-> IPv4Prefix5, IPv6Prefix2 <-> IPv6Prefix5
* IPv4Prefix3 <-> IPv4Prefix6, IPv6Prefix3 <-> IPv6Prefix6<br><br><br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove all <br>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, removed all


```TODO: gNOI.ClearBGPNeighborRequest_GRACEFUL used in this case is under review in https://github.com/openconfig/gnoi/pull/214```

***`a. Expected behavior when "Administrative Reset" Notification (rfc4486) sent to the peer while the ERR policy is attached to the neighborship.`***
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario -- who is implementing ERR? It seems like not the DUT? Are we expecting that the DUT does ERR even when it itself triggers the restart?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Intent of sending code 6 subcode 4 while peers have exchanged the "N" bit in the RESET Flag of Capability 64, is for both peers to reset the TCP session but not flush routes until the configured Retention period. They will add the other attributes matching the ACCEPT action of the ERR policy for routes learnt from each other and advertise accordingly to their respective peers. Tl;dr: Both on either side of the disrupted TCP will need to practice ERR for seamless operation of the impacted routes as expected.

***`a. Expected behavior when "Administrative Reset" Notification (rfc4486) sent to the peer while the ERR policy is attached to the neighborship.`***
* Start traffic as per the flows above
* Trigger BGP Notification (code 6 subocde 4) from DUT:Port1 towards ATE:Port1. Please use the `gNOI.ClearBGPNeighborRequest_GRACEFUL` message.
* Cease notification of Code 6, subcode 4 will result in tcp connection reset but the routes aren't flushed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the "routes aren't flushed" applicable to the ATE session?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, ERR is an extension to RFC8538 which is again an extension to RFC4724. While the last introduced GR, it excluded NOTIFICATION messages. Hence implementations supporting RFC4724 will flush routes for all kinds of NOTIFICATION messages. RFC8538 is an update to include NOTIFICATION messages as well part of GR. However, for RFC8538 it became necessary to introduce a new NOTIFICATION message called "HARD RESET" to indeed allow for flushing of routes voluntarily (more in https://datatracker.ietf.org/doc/html/rfc8538#section-5.1). Hence, not all messages cause Route flush. While "Administrative RESET" can cause TCP termination, voluntary route flush is possible w/ HARD_RESET (Code 6 subcode 9).
Hope this explains and differentiates the case this Sub test case is testing from actual Route flush scenarios demonstrated in other sub tests.

```TODO: gNOI.ClearBGPNeighborRequest_GRACEFUL used in this case is under review in https://github.com/openconfig/gnoi/pull/214```

***`a. Expected behavior when "Administrative Reset" Notification (rfc4486) received from the peer while ERR policy is attached on the neighborship.`***
* Follow the same procedure as RT-1.35.6.a above. However this time, Trigger BGP Notification (code 6 subocde 4) from ATE:Port1 towards DUT:Port1. Please use the `gNOI.ClearBGPNeighborRequest_GRACEFUL` message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same questions as above w.r.t who is triggering the session reset. In all the cases I am reading that the DUT is doing a reset because of the reference to gNOI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So AIUI, if an implementation sends Administrative reset, it knows it sent Code6 subcode 4 and both the peers have negotiated "N" bit already in the RESET flag of the Capability 64 message. Therefore, while the TCP is reset, both do not flush the routes learnt from each other. Now based on the ERR policy, some routes (w/ Accept action) will experience extended retention while some routes (w/ Reject action) will be held only until the RESTART timer expires. Beyond the RESTART timer, if "STALE TIMER" timer is configured (https://datatracker.ietf.org/doc/html/rfc8538#section-4.1), the routes will stay post which they are flushed.

Updated to 80 Characters
Added ERR policy details under summary
Updated the Yaml definition to include OC paths for config and STATE
Adding ":" to paths defined under Yaml
Update Yaml file to run the checks successfully. Path for "config/reject-route" had duplicates.
Some more corrections to Yaml paths and rpc
More corrections to the YAML section
Formatting
@sachendras sachendras changed the title Create README.md ERR README.md Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants