|
| 1 | +# Known Issues |
| 2 | + |
| 3 | +The following is a list of current limitations of the Fabric, which we are |
| 4 | +working hard to address. |
| 5 | + |
| 6 | +### Deleting a VPC and creating a new one right away can cause the agent to fail |
| 7 | + |
| 8 | +The issue is due to limitations in SONiC's gNMI interface, where operations still have to |
| 9 | +be serialized in a certain order. In this particular case, the deletion and creation of |
| 10 | +a VPC back-to-back can lead to the reuse of the deleted VPC's VNI before the deletion had |
| 11 | +effect. |
| 12 | + |
| 13 | +#### How to diagnose this issue |
| 14 | + |
| 15 | +The applied generation on the affected agent as reported by kubectl will not |
| 16 | +converge to the last desired generation. Additionally, the agent logs on the switch |
| 17 | +(accessible in `/var/log/agent.log`) will contain an error similar to the following one: |
| 18 | + |
| 19 | +``` |
| 20 | +time=2025-03-23T12:26:19.649Z level=ERROR msg=Failed err="failed to run agent: failed to process agent config from k8s: failed to process agent config loaded from k8s: failed to apply actions: GNMI set request failed: gnmi set request failed: rpc error: code = InvalidArgument desc = VNI is already used in VRF VrfVvpc-02" |
| 21 | +``` |
| 22 | + |
| 23 | +#### Known workarounds |
| 24 | + |
| 25 | +Deleting the pending VPCs will allow the agent to reconverge. After that, the |
| 26 | +desired VPCs can be safely created. |
| 27 | + |
| 28 | +### VPC local peering can cause the agent to fail if subinterfaces are not supported on the switch |
| 29 | + |
| 30 | +As explained in the [Architecture page](../architecture/fabric.md#vpc-peering), to workaround |
| 31 | +limitations in SONiC, local VPCPeering is implemented over a pair of loopback interfaces. |
| 32 | +This workaround requires subinterfaces support on the switch where the VPCeering is being |
| 33 | +instantiated. If the affected switch does not meet this requirement, the agent will fail |
| 34 | +to apply the desired configuration. |
| 35 | + |
| 36 | +#### How to diagnose this issue |
| 37 | + |
| 38 | +The applied generation on the affected agent as reported by kubectl will not |
| 39 | +converge to the last desired generation. Additionally, the agent logs on the switch |
| 40 | +(accessible in `/var/log/agent.log`) will contain an error similar to the following one: |
| 41 | + |
| 42 | +``` |
| 43 | +time=2025-02-04T13:37:58.675Z level=DEBUG msg=Action idx=90 weight=33 summary="Create Subinterface Base 101" command=update path="/interfaces/interface[name=Ethernet16]/subinterfaces/subinterface[index=101]" |
| 44 | +time=2025-02-04T13:37:58.796Z level=ERROR msg=Failed err="failed to run agent: failed to process agent config from k8s: failed to process agent config loaded from k8s: failed to apply actions: GNMI set request failed: gnmi set request failed: rpc error: code = InvalidArgument desc = SubInterfaces are not supported" |
| 45 | +``` |
| 46 | + |
| 47 | +#### Known workarounds |
| 48 | + |
| 49 | +Configure remote VPCPeering wherever local peering would happen on a switch not supporting |
| 50 | +subinterfaces. You can double-check whether your switch model supports them by looking at |
| 51 | +the [Switch Profiles Catalog](../reference/profiles.md) entry for it. |
| 52 | + |
| 53 | +### External Peering over a connection originating from an MCLAG switch can fail |
| 54 | + |
| 55 | +When importing routes via [External Peering](../user-guide/external.md) over a connection |
| 56 | +originating from an MCLAG leaf switch, traffic from the peered VPC towards that |
| 57 | +prefix can be blackholed. This is due to a routing mismatch between the two MCLAG leaves, |
| 58 | +where only one switch learns the appropriate route to the imported prefix. If the orignating |
| 59 | +traffic hits the "wrong" leaf, it will be dropped with a Destination Unreachable error. |
| 60 | + |
| 61 | +#### How to diagnose this issue |
| 62 | + |
| 63 | +No connectivity from the workload server(s) in the VPC towards the prefix routed via the external. |
| 64 | + |
| 65 | +#### Known workarounds |
| 66 | + |
| 67 | +Connect your externals to non-MCLAG switches instead. |
| 68 | + |
| 69 | +### MCLAG leaf with no surviving spine connection will blackhole traffic |
| 70 | + |
| 71 | +When a leaf switch in an MCLAG pair loses its last uplink to the spine, the BGP session to the spine goes down, |
| 72 | +causing the leaf to stop advertising and receiving EVPN routes. This leads to blackholing of traffic for endpoints |
| 73 | +connected to the isolated leaf, as the rest of the fabric no longer has reachability information for those endpoints, |
| 74 | + even though the MCLAG peering session is up. |
| 75 | + |
| 76 | +#### How to diagnose this issue |
| 77 | + |
| 78 | +Traffic destined for endpoints connected to the leaf is blackholed. All BGP sessions from the affected leaf towards |
| 79 | +the spines are down. |
| 80 | + |
| 81 | +#### Known workarounds |
| 82 | + |
| 83 | +None. |
| 84 | + |
| 85 | +### Cannot set PortSpeed for PortChannels |
| 86 | + |
| 87 | +TBD |
| 88 | + |
| 89 | +#### How to diagnose this issue |
| 90 | + |
| 91 | +TBD |
| 92 | + |
| 93 | +#### Known workarounds |
| 94 | + |
| 95 | +TBD |
0 commit comments