Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No way to influence FRR's choice of multihomed gateway peer #14836

Closed
auranext opened this issue Nov 20, 2023 · 4 comments
Closed

No way to influence FRR's choice of multihomed gateway peer #14836

auranext opened this issue Nov 20, 2023 · 4 comments

Comments

@auranext
Copy link

auranext commented Nov 20, 2023

Hello FRR team and community,

I'm working on a project that will integrate 2 Juniper MX204 routers into our network.
Currently, the network is based on VXLAN/EVPN, which only provides L2 bridging between different vendors.

• HP switches
• H3C switches
• Proxmox/FRR servers
• Etc.

The new MX204 will handle

GW for the VM hosted on baremetals => bridged VXLAN via a VXLAN switch
GW for the VM hosted on Proxmox/FRR servers => bridged VXLAN via the Hypervisor
MX204 implements virtual gateway, so each MX advertise same IP and MAC via BGP/EVPN but each one advertise it s own ESI

Everything is fine but it is not working as expected for the Virtual Gateway
FRR choose the MH nexthop based on peer IP numeric comparison, in this condition it can be only one active gw at a time
We expect that FRR choose the closest peer automatically or manually playing with OSPF/BGP attributes.
We performed a series of tests :

  1. set OSPF degraded cost value on the path of selected GW
    => no changes
  2. set degraded localpref on received EVPN type2 on FRR selected GW (route-map)
    => no changes
  3. set degraded weight on received EVPN type2 on FRR selected GW (route-map)
    => no changes

OSPF cost is also taken into account, verified with "show ip ospf route" but FRR seems to ignore that statement

sample output with modified weight :

ITXPVE09TEST# show ip ospf route
============ OSPF network routing table ============
N    10.255.1.51/32        [12011] area: 0.0.0.0
                           via 50.1.0.2, ens1f0np0
N    10.255.2.51/32        [11011] area: 0.0.0.0
                           via 50.1.0.2, ens1f0np0

BGP localpref and weight are also taken into account, verified with "show bgp l2vpn evpn route vni 29903 json", also ignored by FRR

sample output with modified weight :

  "[2]:[29903]:[48]:[00:00:00:00:00:00]:[32]:[10.20.20.110]":{
    "prefix":"[2]:[29903]:[48]:[00:00:00:00:00:00]:[32]:[10.20.20.110]",
    "prefixLen":352,
    "paths":[
      [
        {
          "valid":true,
          "pathFrom":"internal",
          "routeType":2,
          "ethTag":29903,
          "macLen":48,
          "mac":"00:00:5e:29:99:03",
          "ipLen":32,
          "ip":"10.20.20.110",
          "locPrf":100,
          "weight":2048,
          "peerId":"51.2.0.2",
          "path":"",
          "origin":"IGP",
          "esi":"05:00:00:01:90:00:00:74:cf:00",
          "extendedCommunity":{
            "string":"RT:400:29903 ET:8 MM:0, sticky MAC"
          },
          "nexthops":[
            {
              "ip":"10.255.2.51",
              "afi":"ipv4",
              "used":true
            }
          ]
        }
      ],
      [
        {
          "valid":true,
          "bestpath":true,
          "selectionReason":"EVPN lower IP",
          "pathFrom":"internal",
          "routeType":2,
          "ethTag":29903,
          "macLen":48,
          "mac":"00:00:5e:29:99:03",
          "ipLen":32,
          "ip":"10.20.20.110",
          "locPrf":100,
          "weight":1024,
          "peerId":"51.2.0.2",
          "path":"",
          "origin":"IGP",
          "esi":"05:00:00:01:90:00:00:74:cf:00",
          "extendedCommunity":{
            "string":"RT:400:29903 ET:8 MM:0, sticky MAC"
          },
          "nexthops":[
            {
              "ip":"10.255.1.51",
              "afi":"ipv4",
              "used":true
            }
          ]
        }
      ]
    ]
  },

In every tests we notice that FRR statically keep the selection reason :

"selectionReason":"EVPN lower IP",

FRR version : 9.0.1-0~deb11u1
Linux version : 5.15.116-1-pve

Can you help me on this FRR MH use case ?

Thank you

Maxime

@ton31337
Copy link
Member

I suggest looking at https://github.com/FRRouting/frr/tree/master/tests/topotests/bgp_evpn_mh. Here is an example of a working case EVPN MH. Even more, without the configuration, and topology nobody can't really help you.

@auranext
Copy link
Author

auranext commented Mar 19, 2024

Hi, thank you for your reply

Here is the setup topology and configuration detailed

1- toplogy detailed :

the network is based on VXLAN/EVPN, which only provides L2 bridging between different vendors.
• HP/H3C switches act as VTEP
• HP/H3C routers act as SPINE and Route Reflector
• Proxmox/FRR servers act as VTEP
• 2x Juniper MX204 act as VTEP

EVPN interop works well for years. for the example I describe a single proxmox node (they all work with the same scheme)
Recently introduced the MX provide gateway-ip for VM on PROXMOX in vlxan29903
Junip MX1 underlay IP 10.255.1.51
Junip MX2 underlay IP 10.255.2.51
For redundancy MX brings a new feature: anycast virtual GW
So we discovered EVPN "multihomed" and type1 messages
The virtual GW has mac:00:00:5e:29:99:03 and ip:10.20.20.110
As mentionned in previous post the aed-ESI is published by MXs
FRR received them correctly and add the 2 nexthops and create a NH-grouph in nexthop table.

root@ITXPVE09TEST:~# ip nexthop l
id 268435458 via 10.255.1.51 scope link fdb
id 268435459 via 10.255.2.51 scope link fdb
id 536870913 group 268435458/268435459 fdb

VGA type2 route mentionnes the ESI and iproute FIB contains VGA mac attached to NH-group

root@ITXPVE09TEST:~# bridge fdb | grep 00:00:5e:29:99:03
00:00:5e:29:99:03 dev vxlan29903 sticky master vmbr29903 static
00:00:5e:29:99:03 dev vxlan29903 nhid 536870913 self sticky static

so we have MH VGA active and running , accessible in VNI 29903
The interest of MX VGA is the ability to choose the nearest NH, FRR seems unable to do this and doesn't honor preferences configured manually with OSPF (cost) and BGP attributes (local-pref, weight).

what we expect : FRR prefere nexthop based on administrative distance (OSPF path cost) and honor BGP preferences
what we observe : FRR prefere nexthop EVPN lower IP as mentionned in previous post ( "selectionReason":"EVPN lower IP")

2- detailed PROXMOX configuration

#dummy0 ip-unnumbered 
auto ens1f0np0
iface ens1f0np0 inet static
        address 50.1.1.9
        netmask 255.255.255.255
        mtu 9000

auto dummy0
iface dummy0 inet static
        address 50.1.1.9
        netmask 255.255.255.255
        mtu 9000

auto vmbr29903
iface vmbr29903 inet manual
        # TESTING_vxlan29903
        bridge_waitport 0
        bridge_stp off
        bridge_fd 0
        bridge_ports none
        mtu 8950
        post-up /sbin/ip link set arp off dev $IFACE || true

auto vxlan29903
iface vxlan29903 inet manual
        mtu 8950
        pre-up /sbin/ip link add vxlan29903 type vxlan id 29903 dstport 4789 local 50.1.1.9 nolearning dev dummy0 || true
        post-up /sbin/brctl addif vmbr29903 $IFACE || true
        post-up /sbin/bridge link set dev $IFACE learning off || true

OSPF link is up foreach SPINE (there is 4 spines)
Here is the FRR configuration
for interop correctly we need 'disable-ead-evi-rx' directive because Junos send ead-esi but not ead-evi

`frr version 9.0.1
frr defaults traditional
hostname ITXPVE09TEST
log syslog
no ip forwarding
no ipv6 forwarding
no zebra nexthop kernel enable
bgp no-rib
service integrated-vtysh-config
!
debug zebra events
debug zebra kernel
debug zebra vxlan
debug zebra evpn mh es
debug zebra evpn mh nh
debug zebra evpn mh mac
debug zebra evpn mh neigh
debug ospf zebra
debug ospf event
debug bgp neighbor-events
debug bgp updates in
debug bgp updates out
debug bgp zebra
!
debug route-map
!
interface ens1f0np0
 ip ospf mtu-ignore
 ip ospf network point-to-point
exit
!
interface ens1f1np1
 ip ospf mtu-ignore
 ip ospf network point-to-point
exit
!
router bgp 400
 bgp router-id 50.1.1.9
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 coalesce-time 1000
 bgp graceful-shutdown
 bgp graceful-restart
 neighbor fabric peer-group
 neighbor fabric remote-as 400
 neighbor fabric update-source 50.1.1.9
 neighbor 50.1.0.1 peer-group fabric
 neighbor 50.1.0.2 peer-group fabric
 neighbor 51.2.0.1 peer-group fabric
 neighbor 51.2.0.2 peer-group fabric
 !
 address-family l2vpn evpn
  neighbor fabric activate
  neighbor fabric route-map TEST2 in
  advertise-all-vni
  vni 29903
   rd 1:29903
  exit-vni
    no use-es-l3nhg
  disable-ead-evi-rx
 exit-address-family
exit
!
router ospf
 ospf router-id 50.1.1.9
 log-adjacency-changes detail
 network 50.1.1.9/32 area 0
exit
!
route-map TEST2 permit 11
 match ip next-hop address 10.255.1.51
 set local-preference 110
exit
!
route-map TEST2 permit 21
 match ip next-hop address 10.255.2.51
 set local-preference 210
exit

As I say in previous post I'm trying to understand how FRR chooses nexthop in this MH context and why OSPF cost or bgp local-pref settings have no effect on this choice.

thx all !
have a nice day ;)

Copy link

github-actions bot commented Jan 8, 2025

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

@frrbot
Copy link

frrbot bot commented Jan 8, 2025

This issue will be automatically closed in the specified period unless there is further activity.

@frrbot frrbot bot closed this as completed Jan 15, 2025
@frrbot frrbot bot removed the autoclose label Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants