Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: check multipath before using recursively resolved blackhole nh #14606

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bhinin
Copy link
Contributor

@bhinin bhinin commented Oct 17, 2023

In case a static route is resolving over multiple nexthops (multipath), if the first nexthop in the list is recursively resolving over a blackhole, zebra does not bother looking at other nexthops and sends a netlink update to the kernel marking the route as a blackhole. This results in complete traffic blackholing even in a multipath setup.

If number of nexthops > 1, do not bother checking for recursively resolved blackhole nexthop. Let zebra's multipath handling take over.

@donaldsharp
Copy link
Member

I really don't understand how this can work. It's not currently possible to have a ecmp path which contains a blackhole as far as my understanding goes. Can I get a example that shows the problem you are solving?

@bhinin
Copy link
Contributor Author

bhinin commented Oct 17, 2023

Baseline Configuration:

S>* 30.0.0.0/28 [251/0] via 192.168.12.2, eth3, weight 1, portion 0.333333, 00:00:24
  *                      via 192.168.13.2, eth1, weight 1, portion 0.333333, 00:00:24
  *                      via 192.168.14.2, eth2, weight 1, portion 0.333333, 00:00:24
S>* 192.168.0.0/16 [1/0] unreachable (blackhole), weight 1, portion 1.000000, 1d16h05m
C>* 192.168.12.0/24 is directly connected, eth3, portion 0.000000, 00:00:24
C>* 192.168.13.0/24 is directly connected, eth1, portion 0.000000, 1d17h41m
C>* 192.168.14.0/24 is directly connected, eth2, portion 0.000000, 1d19h21m

In the configuration above, if eth3 goes down, 30.0.0.0/28 will resolve recursively over 192.168.0.0/16 (blackhole nexthop) due to longest prefix matching.

S>  30.0.0.0/28 [1/0] via 192.168.12.2 (recursive), weight 1, 01:04:20
  *                     unreachable (blackhole), weight 1, portion 0.333333, 01:04:20
  *                   via 192.168.13.2, eth1, weight 1, portion 0.333333, 01:04:20
  *                   via 192.168.14.2, eth2, weight 1, portion 0.333333, 01:04:20
S>* 192.168.0.0/16 [1/0] unreachable (blackhole), weight 1, portion 1.000000, 01:05:14
C>* 192.168.13.0/24 is directly connected, eth1, portion 0.000000, 01:05:14
C>* 192.168.14.0/24 is directly connected, eth2, portion 0.000000, 01:05:14

routes in kernel: (note how 30.0.0.0/28 is marked as a blackhole)

blackhole 30.0.0.0/28 proto 196 metric 20
blackhole 192.168.0.0/16 proto 196 metric 20
192.168.13.0/24 dev eth1 proto kernel scope link src 192.168.13.1
192.168.14.0/24 dev eth2 proto kernel scope link src 192.168.14.1

This is because 192.168.12.2 is first in the list of nexthops.

Compare this to a case where eth2 is brought down instead. Ignore the rejected tag for now. That is due to a bug in the netlink update from zebra to kernel (#14613).

S>r 30.0.0.8/28 [251/0] via 192.168.12.2, eth2, weight 1, portion 0.333333, 00:00:02
  r                      via 192.168.13.2, eth1, weight 1, portion 0.333333, 00:00:02
  r                      via 192.168.14.2 (recursive), weight 1, 00:00:02
  r                        unreachable (blackhole), weight 1, portion 0.333333, 00:00:02

routes in kernel:

30.0.0.0/28 proto 196 metric 20
        nexthop via 192.168.12.2 dev eth3 weight 240
        nexthop via 192.168.13.2 dev eth1 weight 240
        nexthop via 192.168.14.2 dev eth2 weight 240 dead linkdown

There is a discrepancy in how zebra treats a route with a recursively resolved blackhole nexthop simply based on the order that the nexthops are processed in. (smaller IP is earlier in the list).

@bhinin
Copy link
Contributor Author

bhinin commented Oct 17, 2023

This fix is restoring the nexthop_num == 1 condition for processing blackhole nexthops. This was changed after 4be03ff.

When checking if a nexthop is a blackhole, zebra looks at the route's
very first nexthop in the list, checks if it is a blackhole or
recursively resolving over a blackhole and sends an update to the
kernel marking the whole route as a blackhole. Resulting in complete
traffic blackholing.

This breaks multipath, where there might be other valid nexthops
further down the list and traffic could be forwarded to those other
nexthops.

If number of nexthops > 1, zebra should not simply check the first
nexthop in the list. Let zebra's multipath handling take over.

Signed-off-by: Abhishek Naik <[email protected]>
@bhinin bhinin force-pushed the zebra_blackhole_nexhtop branch from def7838 to 9cde90d Compare October 20, 2023 20:35
@frrbot frrbot bot added the tests Topotests, make check, etc label Oct 20, 2023
@github-actions github-actions bot added size/L and removed size/S labels Oct 20, 2023
@bhinin bhinin force-pushed the zebra_blackhole_nexhtop branch from 9cde90d to eacf52b Compare October 20, 2023 20:36
@bhinin
Copy link
Contributor Author

bhinin commented Oct 24, 2023

@ton31337 can I get a review on the revised PR. Thanks!

@donaldsharp
Copy link
Member

IMO zebra should not allow a ecmp path that has a normal nexthop and a blackhole nexthop. This should be stopped higher up in zebra and make the route unresolvable.

@donaldsharp
Copy link
Member

staticd should also probably be changed to notice that one of it's paths is unreachable and make an appropriate choice about the route. I still need to think about this a bit more too though

@donaldsharp
Copy link
Member

@bhinin I was thinking about this some more and perhaps I am missing something. Why do you need a ecmp route where one of the nexthops is a blackhole? Perhaps I am missing the use case as that I cannot think of any way that this is useful in routing. Can you take a step back and give us the requirements of what you are trying to do?

@bhinin
Copy link
Contributor Author

bhinin commented Oct 31, 2023

@donaldsharp, I don't need one of the paths to resolve over blackhole. It's occurring because of Longest Prefix Match over a blackhole route - that is a side effect of the aggregate blackhole route. See config below:

S>* 30.0.0.0/28 [251/0] via 192.168.12.2, eth3, weight 1, portion 0.333333, 00:00:24
  *                      via 192.168.13.2, eth1, weight 1, portion 0.333333, 00:00:24
  *                      via 192.168.14.2, eth2, weight 1, portion 0.333333, 00:00:24
S>* 192.168.0.0/16 [1/0] unreachable (blackhole), weight 1, portion 1.000000, 1d16h05m

The configuration of static routes above is done using GW only nexthops:

ip route 30.0.0.0/28 192.168.12.2
ip route 30.0.0.0/28 192.168.13.2
ip route 30.0.0.0/28 192.168.14.2

interface eth1
    ip address 192.168.13.1/31

interface eth2
    ip address 192.168.14.1/31

interface eth3
    ip address 192.168.12.1/31

When processing the paths, zebra checks if the very first nexthop is resolved over a blackhole. If yes, then the route to 30.0.0.0/28 will be marked as a blackhole.

But there is an inconsistency here based on which interface goes down.

  1. If eth3 goes down, since 192.168.12.2 is the first nexthop and is now recursively resolving over blackhole, zebra will mark 30.0.0.0/28 as a blackhole.
  2. If eth2 goes down, zebra checks the first nexthop (192.168.12.2) sees that it is not resolving over blackhole and lets the multipath logic take over.

This code change addresses that inconsistency where if multipath is present, don't simply look at the first nexthop. Look at all.

@donaldsharp
Copy link
Member

That is not my question, I understand this part. It's the simple mechanics of how we are getting into this state. What is the meta problem that is causing you to get into this state? WHY do you want a route that has ecmp where one or more of the nexthops is a blackhole and one or more of the nexthops are normal? What is the problem that is being solved?

@bhinin
Copy link
Contributor Author

bhinin commented Oct 31, 2023

  1. I need ecmp, hence the static configuration using multiple GWs configured on interfaces.
  2. The device learns a default route from a neighbor.
  3. I need the blackhole route to advertise an aggregate route out to other neighbors over BGP. I could achieve this by NOT configuring it statically, and simply advertising it using router bgp > network 192.168.0.0 and no bgp network import-check. But with this, if one of the links in (1) were to go down, that path will now recursively resolve over the default route learned in (2) which I want to avoid. I just want that one path to be removed.

I fixed the problem by using GW + IF nexthop in the static route configuration. That way when the link goes down, that nexthop is removed.

@bhinin
Copy link
Contributor Author

bhinin commented Nov 13, 2023

@donaldsharp, following up on this. Even with this being fixed higher up in zebra or staticd, I believe the fix here is required as zebra is treating nexthops differently based on where it is in the list.

@donaldsharp
Copy link
Member

with my proposed changes the dplane part of zebra will never see this route. It's a moot point. Like I said we need to fix both staticd and zebra main processing pthread to do the right thing.

Copy link

This PR is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this PR closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants