-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare PodCIDRs in NodeRouteController #7034
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ import ( | |
"fmt" | ||
"net" | ||
"net/netip" | ||
"slices" | ||
"sync" | ||
"time" | ||
|
||
|
@@ -548,21 +549,32 @@ func (c *Controller) addNodeRoute(nodeName string, node *corev1.Node) error { | |
return err | ||
} | ||
peerWireGuardPublicKey := node.Annotations[types.NodeWireGuardPublicAnnotationKey] | ||
podCIDRStrs := getPodCIDRsOnNode(node) | ||
if len(podCIDRStrs) == 0 { | ||
// If no valid PodCIDR is configured in Node.Spec, return immediately. | ||
return nil | ||
} | ||
|
||
nrInfo, installed, _ := c.installedNodes.GetByKey(nodeName) | ||
// Route is already added for this Node and Node MAC, transport IP | ||
// and WireGuard public key are not changed. | ||
samePodCIDRs := func(podCIDRs []*net.IPNet) bool { | ||
// There is no reason to worry about the ordering of the CIDRs here. In a dual-stack | ||
// cluster, the CIDRs should always be ordered consistently by IP family. Even if | ||
// this was not true, samePodCIDRs would just return false and we would end up | ||
// re-installing the same routes (and the podCIDRs field in nodeRouteInfo would be | ||
// updated). | ||
return slices.EqualFunc(podCIDRStrs, podCIDRs, func(cidrStr string, cidr *net.IPNet) bool { | ||
return cidr.String() == cidrStr | ||
}) | ||
} | ||
// Route is already added for this Node and Node MAC, transport IP, podCIDRs and WireGuard | ||
// public key are not changed. | ||
Comment on lines
+569
to
+570
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Previously it didn't need to delete stale routes first because the PodCIDR is used as key in routes/flows. Since it could change now, we should delete stale routes first when it changes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will fix that. Old flows should already be deleted because our OF client uses the node Name as the key (and I think we handle this case already). But there is definitely an issue with routes as they do use the PodCIDR as the key. |
||
if installed && nrInfo.(*nodeRouteInfo).nodeMAC.String() == peerNodeMAC.String() && | ||
peerNodeIPs.Equal(*nrInfo.(*nodeRouteInfo).nodeIPs) && | ||
samePodCIDRs(nrInfo.(*nodeRouteInfo).podCIDRs) && | ||
nrInfo.(*nodeRouteInfo).wireGuardPublicKey == peerWireGuardPublicKey { | ||
return nil | ||
} | ||
|
||
podCIDRStrs := getPodCIDRsOnNode(node) | ||
if len(podCIDRStrs) == 0 { | ||
// If no valid PodCIDR is configured in Node.Spec, return immediately. | ||
return nil | ||
} | ||
klog.InfoS("Adding routes and flows to Node", "Node", nodeName, "podCIDRs", podCIDRStrs, | ||
"addresses", node.Status.Addresses) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the previous routes be deleted if there are? otherwise the following situation may cause an issue:
Node C observed two events: Node B was updated with PodCIDR 10.0.0.0/24, Node A was updated with empty PodCIDR.
When reconciling Node B, the controller won't create new route to 10.0.0.0/24 because of the following
nodesHaveSamePodCIDR
check.When reconciling Node B, the controller won't delete old route to 10.0.0.0/24 because it returns immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right that's an issue.
I think I was assuming that a Node wouldn't stay without PodCIDR, and that it would be resolved, but I guess this situation can last if the cluster has run out of Node CIDRs.