-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: check multipath before using recursively resolved blackhole nh #14606
base: master
Are you sure you want to change the base?
Conversation
I really don't understand how this can work. It's not currently possible to have a ecmp path which contains a blackhole as far as my understanding goes. Can I get a example that shows the problem you are solving? |
Baseline Configuration:
In the configuration above, if eth3 goes down, 30.0.0.0/28 will resolve recursively over 192.168.0.0/16 (blackhole nexthop) due to longest prefix matching.
routes in kernel: (note how 30.0.0.0/28 is marked as a blackhole)
This is because 192.168.12.2 is first in the list of nexthops. Compare this to a case where eth2 is brought down instead. Ignore the
routes in kernel:
There is a discrepancy in how zebra treats a route with a recursively resolved blackhole nexthop simply based on the order that the nexthops are processed in. (smaller IP is earlier in the list). |
This fix is restoring the |
When checking if a nexthop is a blackhole, zebra looks at the route's very first nexthop in the list, checks if it is a blackhole or recursively resolving over a blackhole and sends an update to the kernel marking the whole route as a blackhole. Resulting in complete traffic blackholing. This breaks multipath, where there might be other valid nexthops further down the list and traffic could be forwarded to those other nexthops. If number of nexthops > 1, zebra should not simply check the first nexthop in the list. Let zebra's multipath handling take over. Signed-off-by: Abhishek Naik <[email protected]>
def7838
to
9cde90d
Compare
Signed-off-by: Abhishek Naik <[email protected]>
9cde90d
to
eacf52b
Compare
@ton31337 can I get a review on the revised PR. Thanks! |
IMO zebra should not allow a ecmp path that has a |
staticd should also probably be changed to notice that one of it's paths is unreachable and make an appropriate choice about the route. I still need to think about this a bit more too though |
@bhinin I was thinking about this some more and perhaps I am missing something. Why do you need a ecmp route where one of the nexthops is a blackhole? Perhaps I am missing the use case as that I cannot think of any way that this is useful in routing. Can you take a step back and give us the requirements of what you are trying to do? |
@donaldsharp, I don't need one of the paths to resolve over blackhole. It's occurring because of Longest Prefix Match over a blackhole route - that is a side effect of the aggregate blackhole route. See config below:
The configuration of static routes above is done using GW only nexthops:
When processing the paths, zebra checks if the very first nexthop is resolved over a blackhole. If yes, then the route to 30.0.0.0/28 will be marked as a blackhole. But there is an inconsistency here based on which interface goes down.
This code change addresses that inconsistency where if multipath is present, don't simply look at the first nexthop. Look at all. |
That is not my question, I understand this part. It's the simple mechanics of how we are getting into this state. What is the meta problem that is causing you to get into this state? WHY do you want a route that has ecmp where one or more of the nexthops is a blackhole and one or more of the nexthops are |
I fixed the problem by using |
@donaldsharp, following up on this. Even with this being fixed higher up in zebra or staticd, I believe the fix here is required as zebra is treating nexthops differently based on where it is in the list. |
with my proposed changes the dplane part of zebra will never see this route. It's a moot point. Like I said we need to fix both staticd and zebra main processing pthread to do the right thing. |
This PR is stale because it has been open 180 days with no activity. Comment or remove the |
In case a static route is resolving over multiple nexthops (multipath), if the first nexthop in the list is recursively resolving over a blackhole, zebra does not bother looking at other nexthops and sends a netlink update to the kernel marking the route as a blackhole. This results in complete traffic blackholing even in a multipath setup.
If number of nexthops > 1, do not bother checking for recursively resolved blackhole nexthop. Let zebra's multipath handling take over.