You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been -- and am still trying to -- resolve this myself. In case I can't...
When adding VRFs with ip routes, FRR sometimes moves a route outside of the VRF stanza and into the default VRF. When this happens, typically vrf-policy and vnc defaults are added at the end of the first router bgp stanza just after the last address-family block despite not being in the config.
To show the evolution of the VRF stanzas, here are some debug lines from /var/log/frr/frr-reload.log. I cut the logs down to just show the VRF stanzas.
First we see the /etc/frr/frr.conf file parsed, showing complete and proper contexts.
2023-05-26 02:11:14,666 INFO: Loading Config object from file /etc/frr/frr.conf
2023-05-26 02:11:14,758 DEBUG: LINE vrf vrf2000004 : enter context ['vrf vrf2000004']
2023-05-26 02:11:14,758 DEBUG: LINE vni 2000004 : add to current context ['vrf vrf2000004']
2023-05-26 02:11:14,758 DEBUG: LINE ip route 10.6.96.4/32 vpcnode331 : add to current context ['vrf vrf2000004']
2023-05-26 02:11:14,758 DEBUG: LINE ip route 10.6.96.3/32 vpcnode338 : add to current context ['vrf vrf2000004']
2023-05-26 02:11:14,758 DEBUG: LINE exit-vrf : exit context ['vrf vrf2000004']
2023-05-26 02:11:14,759 DEBUG: LINE vrf vrf2000005 : enter context ['vrf vrf2000005']
2023-05-26 02:11:14,759 DEBUG: LINE vni 2000005 : add to current context ['vrf vrf2000005']
2023-05-26 02:11:14,759 DEBUG: LINE ip route 10.6.112.4/32 vpcnode332 : add to current context ['vrf vrf2000005']
2023-05-26 02:11:14,759 DEBUG: LINE ip route 10.6.112.3/32 vpcnode348 : add to current context ['vrf vrf2000005']
2023-05-26 02:11:14,759 DEBUG: LINE exit-vrf : exit context ['vrf vrf2000005']
Next is the new FRR conf with the same
2023-05-26 02:11:14,818 DEBUG: New Frr Config
vrf vrf2000004
vni 2000004
ip route 10.6.96.4/32 vpcnode331
ip route 10.6.96.3/32 vpcnode338
exit-vrf
end
!
vrf vrf2000005
vni 2000005
ip route 10.6.112.4/32 vpcnode332
ip route 10.6.112.3/32 vpcnode348
exit-vrf
end
!
Load from vtysh show running where we're first missing ip route 10.6.112.3/32 vpcnode348.
2023-05-26 02:11:14,819 INFO: Loading Config object from vtysh show running
2023-05-26 02:11:14,904 DEBUG: LINE vrf vrf2000004 : enter context ['vrf vrf2000004']
2023-05-26 02:11:14,904 DEBUG: LINE vni 2000004 : add to current context ['vrf vrf2000004']
2023-05-26 02:11:14,904 DEBUG: LINE ip route 10.6.96.3/32 vpcnode338 : add to current context ['vrf vrf2000004']
2023-05-26 02:11:14,904 DEBUG: LINE ip route 10.6.96.4/32 vpcnode331 : add to current context ['vrf vrf2000004']
2023-05-26 02:11:14,904 DEBUG: LINE exit-vrf : exit context ['vrf vrf2000004']
2023-05-26 02:11:14,904 DEBUG: LINE vrf vrf2000005 : enter context ['vrf vrf2000005']
2023-05-26 02:11:14,904 DEBUG: LINE vni 2000005 : add to current context ['vrf vrf2000005']
2023-05-26 02:11:14,904 DEBUG: LINE ip route 10.6.112.4/32 vpcnode332 : add to current context ['vrf vrf2000005']
2023-05-26 02:11:14,905 DEBUG: LINE exit-vrf : exit context ['vrf vrf2000005']
The first pass (i.e., pass 0) with the same route missing
2023-05-26 02:11:14,915 DEBUG: Running Frr Config (Pass #0)
vrf vrf2000004
vni 2000004
ip route 10.6.96.3/32 vpcnode338
ip route 10.6.96.4/32 vpcnode331
exit-vrf
end
!
vrf vrf2000005
vni 2000005
ip route 10.6.112.4/32 vpcnode332
exit-vrf
end
!
Data passed off to a temp file for the second pass -- the missing route!
Load from vtysh show running again where we now see the route in its own context. Additionally, vrf-policy and vnc defaults are defined in the first router bgp stanza after the last address-family block.
2023-05-26 02:11:15,113 INFO: Loading Config object from vtysh show running
2023-05-26 02:11:15,199 DEBUG: LINE ip route 10.6.112.3/32 vpcnode348 : single-line context
2023-05-26 02:11:15,199 DEBUG: LINE vrf vrf2000004 : enter context ['vrf vrf2000004']
2023-05-26 02:11:15,199 DEBUG: LINE vni 2000004 : add to current context ['vrf vrf2000004']
2023-05-26 02:11:15,199 DEBUG: LINE ip route 10.6.96.3/32 vpcnode338 : add to current context ['vrf vrf2000004']
2023-05-26 02:11:15,199 DEBUG: LINE ip route 10.6.96.4/32 vpcnode331 : add to current context ['vrf vrf2000004']
2023-05-26 02:11:15,199 DEBUG: LINE exit-vrf : exit context ['vrf vrf2000004']
2023-05-26 02:11:15,199 DEBUG: LINE vrf vrf2000005 : enter context ['vrf vrf2000005']
2023-05-26 02:11:15,200 DEBUG: LINE vni 2000005 : add to current context ['vrf vrf2000005']
2023-05-26 02:11:15,200 DEBUG: LINE ip route 10.6.112.4/32 vpcnode332 : add to current context ['vrf vrf2000005']
2023-05-26 02:11:15,200 DEBUG: LINE exit-vrf : exit context ['vrf vrf2000005']
...
2023-05-26 02:11:15,202 DEBUG: LINE vrf-policy vrf2000005 : enter context ['router bgp REDACTED', 'vrf-policy vrf2000005']
2023-05-26 02:11:15,203 DEBUG: LINE exit-vrf-policy : exit context ['router bgp REDACTED', 'vrf-policy vrf2000005']
2023-05-26 02:11:15,203 DEBUG: LINE vnc defaults : enter context ['router bgp REDACTED', 'vnc defaults']
2023-05-26 02:11:15,203 DEBUG: LINE exit-vnc : exit context ['router bgp REDACTED', 'vnc defaults']
The second pass (i.e., pass 1) with the route in the default VRF
ip route 10.6.112.3/32 vpcnode348
!
vrf vrf2000004
vni 2000004
ip route 10.6.96.3/32 vpcnode338
ip route 10.6.96.4/32 vpcnode331
exit-vrf
end
!
vrf vrf2000005
vni 2000005
ip route 10.6.112.4/32 vpcnode332
exit-vrf
end
!
Before continuing, note I added some of my own debugging and found the ip route command is hitting this condition: if x == 1 and ctx_keys[0].startswith("no "):. I'm not sure why, but ctx_keys[0] is showing as no ip route 10.6.112.3/32 vpcnode348 there. That leads us here, where more data is passed to a new file, including the route. (Also note that everything in the array was present twice this time for some reason.)
Did you test it on the latest FRRouting/frr master branch?
To Reproduce
I've been testing a new product with 1-2 VRFs, each with 1-2 interfaces attached. Reproduction can be oddly sporadic, but I find it happens most often when I have two VRFs defined and each has two interfaces attached. When a new interface is added to a VRF, /etc/frr/frr.conf is updated and a reload is performed: systemctl reload frr. When I check the running config with vtysh -c "show run", I can see the route issue. When this doesn't happen, all network activity is as expected; when it does, of course, there are problems.
Expected behavior
When reloading FRR, ip routes should stay in the VRF stanza and not move to the default VRF, and we shouldn't add VRF information into router bgp stanzas.
Screenshots
Versions
OS Version: Debian 11
Kernel: Linux 6.1.24-bullseye
FRR Version: 8.3.1 (but I also tested with the master branch's latest frr-reload.py file)
Additional context
Likely unrelated, but in my review, I found prior_ctx_key is a local variable in check_for_exit_vrf which is used before it's defined. I initialized it before the loop to None, but it didn't really seem to make a difference for my issue.
The text was updated successfully, but these errors were encountered:
I originally closed this since I thought it was resolved by something on our side, but I seemingly just found a way to make it more stable. I haven't been able to get a reply in the FRRouting Slack server, so I reopened this issue. Can someone review, please? I can provide more information as needed.
From the FRR Slack message I sent:
We've been having sporadic trouble with FRR reloads since we released a product months ago. Sometimes we don't see it much and other times we see it quite a bit. We've reviewed configs, tested various versions of FRR (including the most recent release, 9.0.1), and we still have reload failures. To be clear, any time we start/restart FRR, the configs appear as expected and all is well, but when we try to reload to keep the service up, many times we get misconfigurations. Usually we see some static routes in the default VRF instead of the particular VRF they belong to or vrf-policy and vnc defaults lines added to a router bgp <asn> stanza despite not using those in any of our configs. The misconfigurations may keep the service up with network connectivity problems after a reload, but a subsequent reload will cause FRR to restart, at which point the configs are correct at the cost of some downtime. Evidence leads us to believe the frr-reload.py script is the problem. Is there anyone who could assist in tracking down the issue?
Describe the bug
I have been -- and am still trying to -- resolve this myself. In case I can't...
When adding VRFs with
ip route
s, FRR sometimes moves a route outside of the VRF stanza and into the default VRF. When this happens, typicallyvrf-policy
andvnc defaults
are added at the end of the firstrouter bgp
stanza just after the lastaddress-family
block despite not being in the config.To show the evolution of the VRF stanzas, here are some debug lines from
/var/log/frr/frr-reload.log
. I cut the logs down to just show the VRF stanzas.First we see the
/etc/frr/frr.conf
file parsed, showing complete and proper contexts.Next is the new FRR conf with the same
Load from
vtysh show running
where we're first missingip route 10.6.112.3/32 vpcnode348
.The first pass (i.e., pass 0) with the same route missing
Data passed off to a temp file for the second pass -- the missing route!
Load from
vtysh show running
again where we now see the route in its own context. Additionally,vrf-policy
andvnc defaults
are defined in the firstrouter bgp
stanza after the lastaddress-family
block.The second pass (i.e., pass 1) with the route in the default VRF
Before continuing, note I added some of my own debugging and found the
ip route
command is hitting this condition:if x == 1 and ctx_keys[0].startswith("no "):
. I'm not sure why, butctx_keys[0]
is showing asno ip route 10.6.112.3/32 vpcnode348
there. That leads us here, where more data is passed to a new file, including the route. (Also note that everything in the array was present twice this time for some reason.)To Reproduce
I've been testing a new product with 1-2 VRFs, each with 1-2 interfaces attached. Reproduction can be oddly sporadic, but I find it happens most often when I have two VRFs defined and each has two interfaces attached. When a new interface is added to a VRF,
/etc/frr/frr.conf
is updated and a reload is performed:systemctl reload frr
. When I check the running config withvtysh -c "show run"
, I can see the route issue. When this doesn't happen, all network activity is as expected; when it does, of course, there are problems.Expected behavior
When reloading FRR,
ip route
s should stay in the VRF stanza and not move to the default VRF, and we shouldn't add VRF information intorouter bgp
stanzas.Screenshots
Versions
frr-reload.py
file)Additional context
Likely unrelated, but in my review, I found
prior_ctx_key
is a local variable incheck_for_exit_vrf
which is used before it's defined. I initialized it before the loop toNone
, but it didn't really seem to make a difference for my issue.The text was updated successfully, but these errors were encountered: