Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: return fully-resolved route to NHT clients #16793

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

enkechen-panw
Copy link
Contributor

@enkechen-panw enkechen-panw commented Sep 11, 2024

Brief summary: Currently recursive route + fully-resolved nexthop are sent to
NHT client. Proposed fix is to send fully-resolved (route + nexthop) instead.


When a nexthop being tracked is resolved to a recursive route at the
first level of recursion, several parameters (such as the prefix,
metric and protocol type) of the recursive route are sent to clients,
along with the fully-resolved nexthop.

This creates an inconsistency or de-coupling between the route and its
nexthop, that is, the fully-resolved nexthop would belong to a different
route that is non-recursive. Thus the update message to clients would
contain parameters like prefix, metric and protocl type from a recursive
route, and the nexthop from a non-recursive route.

In particular, the protocol type and the metric fields in the update
message no longer reflect the property of the fully-resolved nexthop in
the meesage. It would be diffcult for a client to make sense of these
parameters.

As an example, when the nexthop being tracked is resolved to a BGP route,
instead of further resolving to an IGP route and getting its metric, the
MED value of the BGP route would be taken as the metric and give to BGP
for bestpath calculation.

The fix is to always calculate and return the fully-resolved route to
clients so that the route parameters like protocol type, distance,
metric and the resolved nexthops are consistent.

When a nht nexthop is resolved recursively, associate the nht nexthop
with each of the recursive routes so that the nht entry is re-evaluated
when there is a change to any of the dependent routes.

@donaldsharp
Copy link
Member

This is pretty much a hard no from me. Having BGP act as a underlay for itself is something that is allowed and frankly lots of people are using. Can you explain your reasoning on why zebra is sending the wrong metric value up?

@enkechen-panw
Copy link
Contributor Author

Failure case:
When the nexthop is resolved incompletely to a BGP route, the MED value (installed as metric in zebra) is taken as the IGP metric, as shown from the output of the simple topotest included in the patch:

DEBUG:r1:cmd_status("/bin/bash -c 'vtysh -c '"'"'show bgp ipv4 unicast 10.0.0.4/32 json'"'"' 2>/dev/null'")
DEBUG:r1:vtysh result:
{
"prefix":"10.0.0.4/32",
"bestpath":{
"overall":true,
"selectionReason":"Router ID" <<<<<<
},
"nexthops":[
{
"ip":"192.168.24.4",
"hostname":"r2",
"afi":"ipv4",
"metric":0, <<<<<<<
"accessible":true,
"used":true
}
],
"nexthops":[
{
"ip":"192.168.34.4",
"hostname":"r3",
"afi":"ipv4",
"metric":0, <<<<<<<<
"accessible":true,
"used":true
}
],
}

@enkechen-panw
Copy link
Contributor Author

Don, as you said "Having BGP act as a underlay for itself is something that is allowed and frankly lots of people are using". that's exactly when we run into the issue that the BGP bestpath is wrong.

Please take a look at the simple topotest.

@donaldsharp
Copy link
Member

You need to give me more data than a simple show command in order for me to understand the problem. Please show me the routes in zebra as well as in bgp as well as your config. Else I cannot see anything wrong. A metric of 0 in isolation without any other context is next to useless.

@enkechen-panw
Copy link
Contributor Author

The configs and the topology are all in the topotest I have included in the PR. I will get the zebra output.

@enkechen-panw
Copy link
Contributor Author

enkechen-panw commented Sep 11, 2024

I have attached the relevant output from the topotest:

output.fail.txt
output.working.txt

Please reference the topology and description in test_nht_double_recusion.py.

The additional output is generated by applying the following diff to the topotest:

diff.test.txt

@enkechen-panw
Copy link
Contributor Author

enkechen-panw commented Sep 13, 2024

Can I get some advice about this test failure:
I found it useful to display the prefix on which the nexthop is resolved in "show ip nht". So I added it in the output, but the extra text is causing the failure.

What is our recommendation for handling this case?
Should I just remove it from "show ip nht", but keep it in "show ip nht jason"?

2024-09-11 06:59:24,525 ERROR: topo: test failed at "test_all_protocol_startup/test_nht": r1 failed ip nht check:
--- Actual show ip nht
+++ Expected show ip nht
@@ -2,42 +2,34 @@
Resolve via default: on
1.1.1.1
resolved via static
- prefix 1.1.1.1/32
is directly connected, r1-eth1 (vrf default), weight 1
Client list: pbr(fd XX)

@ton31337
Copy link
Member

You have to fix the test too (to expect the same output as it's from the CLI).

@enkechen-panw
Copy link
Contributor Author

Thanks for the advice. Let me remove the extra text from the current PR, and handle it as a separate item.

@enkechen-panw
Copy link
Contributor Author

I had to rebase in order to update the style issues found by "frrbot". But so many tags have been added. Not sure what I have done wrong. Should I create a new PR?

@enkechen-panw
Copy link
Contributor Author

The branch got messed up during updates. Close this PR, and opened a new PR: #16832

zebra/zebra_rnh.c Outdated Show resolved Hide resolved
zebra/zebra_rnh.c Outdated Show resolved Hide resolved
zebra/zebra_rnh.c Outdated Show resolved Hide resolved
@donaldsharp
Copy link
Member

it's not clear to me at this point in time what will happen if we have multiple recursive resolutions across multiple protocols what will happen here. I still need to spend more time with this

@donaldsharp
Copy link
Member

imo this code is broken. If I add this:

r1(config)# ip route 192.168.12.2/32 192.168.12.99 r1-eth0
r1(config)# ip route 192.168.12.2/32 192.168.12.100 r1-eth0
r1(config)# ip route 192.168.24.0/24 192.168.12.2 r1-eth0

we get this:

r1# show ip route
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

O   10.0.0.1/32 [110/0] is directly connected, lo, weight 1, 00:04:17
L * 10.0.0.1/32 is directly connected, lo, weight 1, 00:04:17
C>* 10.0.0.1/32 is directly connected, lo, weight 1, 00:04:17
O>* 10.0.0.2/32 [110/10] via 192.168.12.2, r1-eth0, weight 1, 00:04:07
O>* 10.0.0.3/32 [110/30] via 192.168.13.3, r1-eth1, weight 1, 00:04:07
B>  10.0.0.4/32 [200/0] via 192.168.24.4 (recursive), weight 1, 00:04:06
  *                       via 192.168.12.2, r1-eth0, weight 1, 00:04:06
B   192.168.12.0/24 [200/0] via 10.0.0.2 (recursive), weight 1, 00:04:06
                              via 192.168.12.2, r1-eth0, weight 1, 00:04:06
O   192.168.12.0/24 [110/10] is directly connected, r1-eth0, weight 1, 00:04:17
C>* 192.168.12.0/24 is directly connected, r1-eth0, weight 1, 00:04:17
L>* 192.168.12.1/32 is directly connected, r1-eth0, weight 1, 00:04:17
S>* 192.168.12.2/32 [1/0] via 192.168.12.99, r1-eth0, weight 1, 00:02:09
  *                       via 192.168.12.100, r1-eth0, weight 1, 00:02:09
B   192.168.13.0/24 [200/0] via 10.0.0.3 (recursive), weight 1, 00:04:06
                              via 192.168.13.3, r1-eth1, weight 1, 00:04:06
O   192.168.13.0/24 [110/30] is directly connected, r1-eth1, weight 1, 00:04:17
C>* 192.168.13.0/24 is directly connected, r1-eth1, weight 1, 00:04:17
L>* 192.168.13.1/32 is directly connected, r1-eth1, weight 1, 00:04:17
S>  192.168.24.0/24 [1/0] via 192.168.12.2, r1-eth0 (recursive), weight 1, 00:00:03
  *                         via 192.168.12.99, r1-eth0, weight 1, 00:00:03
  *                         via 192.168.12.100, r1-eth0, weight 1, 00:00:03
B   192.168.24.0/24 [200/0] via 10.0.0.2 (recursive), weight 1, 00:04:06
                              via 192.168.12.2, r1-eth0, weight 1, 00:04:06
B>  192.168.34.0/24 [200/0] via 10.0.0.3 (recursive), weight 1, 00:04:06
  *                           via 192.168.13.3, r1-eth1, weight 1, 00:04:06

and this:

r1# show ip nht
VRF default:
 Resolve via default: on
10.0.0.2
 resolved via ospf, prefix 10.0.0.2/32
 via 192.168.12.2, r1-eth0 (vrf default), weight 1
 Client list: bgp(fd 40)
10.0.0.3
 resolved via ospf, prefix 10.0.0.3/32
 via 192.168.13.3, r1-eth1 (vrf default), weight 1
 Client list: bgp(fd 40)
192.168.12.2
 resolved via static, prefix 192.168.12.2/32
 via 192.168.12.99, r1-eth0 (vrf default), weight 1
 via 192.168.12.100, r1-eth0 (vrf default), weight 1
 Client list: static(fd 30)
192.168.12.99
 resolved via connected, prefix 192.168.12.0/24
 is directly connected, r1-eth0 (vrf default), weight 1
 Client list: static(fd 30)
192.168.12.100
 resolved via connected, prefix 192.168.12.0/24
 is directly connected, r1-eth0 (vrf default), weight 1
 Client list: static(fd 30)
192.168.24.4
 resolved via ospf, prefix 10.0.0.2/32
 via 192.168.12.2, r1-eth0 (vrf default), weight 1
 Client list: bgp(fd 40)
192.168.34.4
 resolved via ospf, prefix 10.0.0.3/32
 via 192.168.13.3, r1-eth1 (vrf default), weight 1
 Client list: bgp(fd 40)
r1# show ip route 192.168.24.4
Routing entry for 192.168.24.0/24
  Known via "static", distance 1, metric 0, best
  Last update 00:02:35 ago
    192.168.12.2, via r1-eth0 (recursive), weight 1
  *   192.168.12.99, via r1-eth0, weight 1
  *   192.168.12.100, via r1-eth0, weight 1

Routing entry for 192.168.24.0/24
  Known via "bgp", distance 200, metric 0
  Last update 00:06:38 ago
    10.0.0.2 (recursive), weight 1
      192.168.12.2, via r1-eth0, weight 1

The 192.168.24.4 nht code should have switched over to the static routes. It is not doing this. This will seriously break allot of production code.

I also think that by only looking at the first nexthop you will end up in situations where it is ignoring a recursive route resolution that uses multiple paths. Additionally what happens if those multiple paths have different weights? It's not clear to me what weight should be used then.

@enkechen-panw enkechen-panw changed the title zebra: Resolve the nexthop being tracked to a non-BGP route zebra: resolve the nexthop recursively in NHT Sep 21, 2024
@donaldsharp
Copy link
Member

I'm really not convinced that this is an actual nexthop tracking problem that should be resolved this way. I really do not believe that we should be modifying what is sent up here in this manner. I would recommend implementing one of these two solutions instead:

a) When passing up a resolved nexthop for nexthop tracking add a metric value, per nexthop sent. That way the upper level protocol would get the metric of the route entry as well as the metric of the individual nexthops. Then an upper level protocol that cared could make a decision about what it wants to do with that information.

b) Modify the upper level protocol to notice that the metric for a particular nexthop is X then when only installing the route into zebra modify the sent down metric.

@enkechen-panw
Copy link
Contributor Author

Thanks @donaldsharp. I will study your idea and patch in #16917

@enkechen-panw enkechen-panw changed the title zebra: resolve the nexthop recursively in NHT zebra: calculate and return the IGP metric in NHT Sep 27, 2024
@enkechen-panw
Copy link
Contributor Author

enkechen-panw commented Sep 27, 2024

@donaldsharp The revised patch is along the line of a) in your recommendation. The metric field already exists when the nexthop is updated to the clients. The current mechanism is working for non-recursive nexthops. We just need to get the correct value for the recursive nexthops. It's good that no other fields in the rnh or logic need to be changed.

Thanks for your review and suggestions.

@enkechen-panw enkechen-panw changed the title zebra: calculate and return the IGP metric in NHT zebra: calculate the correct IGP metric for recursive nexthop in NHT Sep 27, 2024
@github-actions github-actions bot added size/XL and removed size/L labels Sep 27, 2024
@enkechen-panw
Copy link
Contributor Author

enkechen-panw commented Sep 28, 2024

@donaldsharp @ton31337 @mwinter-osr @riw777 The latest patch has much narrower scope. It only calculates the IGP metric without touching the existing "re" and the nexthop lists in the rnh structure.

Currently the IGP metric is already correct for non-recursive nexthops. The patch will make it correct for recursive nexthops as well.

Please review. Thanks.

@enkechen-panw
Copy link
Contributor Author

In addition to the route metric and nexthops, several other parameters (e.g., prefix and protocol type) are also sent to clients.
To be useful and consistent, all these parameters should come from the fully-resolved route.

I will update the patch shortly.

@enkechen-panw enkechen-panw changed the title zebra: calculate the correct IGP metric for recursive nexthop in NHT zebra: return fully-resolved route to NHT clients Sep 29, 2024
@enkechen-panw
Copy link
Contributor Author

enkechen-panw commented Sep 30, 2024

The recursive route is problematic in these cases as well:

Both BGP and staticd check for the route type. These checks should be on
the fully-resolved route instead of on intermediate routes.

bgp_nexthop_update()
  bgp_process_nexthop_update()
           if (import_check && (nhr->type == ZEBRA_ROUTE_BGP ||
                             !prefix_same(&bnc->prefix, &nhr->prefix))) {
                SET_FLAG(bnc->change_flags, BGP_NEXTHOP_CHANGED);
                UNSET_FLAG(bnc->flags, BGP_NEXTHOP_VALID);

static_zebra_nexthop_update():

        if (nhr->type == ZEBRA_ROUTE_CONNECT) {
                if (static_nexthop_is_local(vrf->vrf_id, matched,
                                            nhr->prefix.family))
                        nhr->nexthop_num = 0;
        }

zebra/rib.h Outdated
uint8_t resolved_distance;
uint16_t resolved_instance;
uint32_t resolved_metric;
struct prefix resolved_prefix;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from resolved_route?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolve_route is the one from the first lookup in zebra_rnh_resolve_nexthop_entry(). In case of a recursive nexthop, the resolved_route is different from the fully-resolved route.

@enkechen-panw
Copy link
Contributor Author

Found an issue with the patch. There can still be inconsistencies between the route and the nexthop.
Working on a new patch.

Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Add a topotest case that a BGP nexthop being tracked in nht is first
resolved to another BGP route, and then recursively resolved to an OSPF
route. As a result, the correct IGP metric from OSPF is given to BGP
for bestpath calculation.

Signed-off-by: Enke Chen <[email protected]>
When a nexthop being tracked is resolved to a recursive route at the
first level of recursion, several parameters (such as the prefix,
metric and protocol type) of the recursive route are sent to clients,
along with the fully-resolved nexthop.

This creates an inconsistency or de-coupling between the route and its
nexthop, that is, the fully-resolved nexthop would belong to a different
route that is non-recursive. Thus the update message to clients would
contain parameters like prefix, metric and protocl type from a recursive
route, and the nexthop from a non-recursive route.

In particular, the protocol type and the metric fields in the update
message no longer reflect the property of the fully-resolved nexthop
in the meesage. It would be diffcult for a client to make sense of
these parameters.

As an example, when the nexthop being tracked is resolved to a BGP
route, instead of further resolving to an IGP route and getting its
metric, the MED value of the BGP route would be taken as the metric
and given to BGP for bestpath calculation, resulting in	incorrect
route selection.

The fix is to always calculate and return the fully-resolved route to
clients so that the route parameters like protocol type, distance,
metric and the resolved nexthops are consistent.

When a nht nexthop is resolved recursively, associate the nht nexthop
with each of the recursive routes so that the nht entry is re-evaluated
when there is a change to any of the dependent routes.

Signed-off-by: Enke Chen <[email protected]>
@enkechen-panw
Copy link
Contributor Author

enkechen-panw commented Oct 18, 2024

@donaldsharp @ton31337 @mwinter-osr @riw777 I have re-worked the patch. Please review. Thanks.

There is one topotest failure in which the resolved nextops (for FIB) are different. I don't understand how the NHT could impact the real nexthop resolution. Can I get some help with that topotest?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants