Skip to content

Commit 8226f23

Browse files
committed
docs: add known issues
Signed-off-by: Emanuele Di Pascale <[email protected]>
1 parent f685895 commit 8226f23

File tree

2 files changed

+96
-0
lines changed

2 files changed

+96
-0
lines changed

docs/.pages

+1
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ nav:
99
- Reference: reference
1010
- Architecture: architecture
1111
- Troubleshooting: troubleshooting
12+
- Known Issues: known-issues
1213
- FAQ: faq
1314
- ...
1415
- release-notes

docs/known-issues/known-issues.md

+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Known Issues
2+
3+
The following is a list of current limitations of the Fabric, which we are
4+
working hard to address.
5+
6+
### Deleting a VPC and creating a new one right away can cause the agent to fail
7+
8+
The issue is due to limitations in SONiC's gNMI interface, where operations still have to
9+
be serialized in a certain order. In this particular case, the deletion and creation of
10+
a VPC back-to-back can lead to the reuse of the deleted VPC's VNI before the deletion had
11+
effect.
12+
13+
#### How to diagnose this issue
14+
15+
The applied generation on the affected agent as reported by kubectl will not
16+
converge to the last desired generation. Additionally, the agent logs on the switch
17+
(accessible in `/var/log/agent.log`) will contain an error similar to the following one:
18+
19+
```
20+
time=2025-03-23T12:26:19.649Z level=ERROR msg=Failed err="failed to run agent: failed to process agent config from k8s: failed to process agent config loaded from k8s: failed to apply actions: GNMI set request failed: gnmi set request failed: rpc error: code = InvalidArgument desc = VNI is already used in VRF VrfVvpc-02"
21+
```
22+
23+
#### Known workarounds
24+
25+
Deleting the pending VPCs will allow the agent to reconverge. After that, the
26+
desired VPCs can be safely created.
27+
28+
### VPC local peering can cause the agent to fail if subinterfaces are not supported on the switch
29+
30+
As explained in the [Architecture page](../architecture/fabric.md#vpc-peering), to workaround
31+
limitations in SONiC, local VPCPeering is implemented over a pair of loopback interfaces.
32+
This workaround requires subinterfaces support on the switch where the VPCeering is being
33+
instantiated. If the affected switch does not meet this requirement, the agent will fail
34+
to apply the desired configuration.
35+
36+
#### How to diagnose this issue
37+
38+
The applied generation on the affected agent as reported by kubectl will not
39+
converge to the last desired generation. Additionally, the agent logs on the switch
40+
(accessible in `/var/log/agent.log`) will contain an error similar to the following one:
41+
42+
```
43+
time=2025-02-04T13:37:58.675Z level=DEBUG msg=Action idx=90 weight=33 summary="Create Subinterface Base 101" command=update path="/interfaces/interface[name=Ethernet16]/subinterfaces/subinterface[index=101]"
44+
time=2025-02-04T13:37:58.796Z level=ERROR msg=Failed err="failed to run agent: failed to process agent config from k8s: failed to process agent config loaded from k8s: failed to apply actions: GNMI set request failed: gnmi set request failed: rpc error: code = InvalidArgument desc = SubInterfaces are not supported"
45+
```
46+
47+
#### Known workarounds
48+
49+
Configure remote VPCPeering wherever local peering would happen on a switch not supporting
50+
subinterfaces. You can double-check whether your switch model supports them by looking at
51+
the [Switch Profiles Catalog](../reference/profiles.md) entry for it.
52+
53+
### External Peering over a connection originating from an MCLAG switch can fail
54+
55+
When importing routes via [External Peering](../user-guide/external.md) over a connection
56+
originating from an MCLAG leaf switch, traffic from the peered VPC towards that
57+
prefix can be blackholed. This is due to a routing mismatch between the two MCLAG leaves,
58+
where only one switch learns the appropriate route to the imported prefix. If the orignating
59+
traffic hits the "wrong" leaf, it will be dropped with a Destination Unreachable error.
60+
61+
#### How to diagnose this issue
62+
63+
No connectivity from the workload server(s) in the VPC towards the prefix routed via the external.
64+
65+
#### Known workarounds
66+
67+
Connect your externals to non-MCLAG switches instead.
68+
69+
### MCLAG leaf with no surviving spine connection will blackhole traffic
70+
71+
When a leaf switch in an MCLAG pair loses its last uplink to the spine, the BGP session to the spine goes down,
72+
causing the leaf to stop advertising and receiving EVPN routes. This leads to blackholing of traffic for endpoints
73+
connected to the isolated leaf, as the rest of the fabric no longer has reachability information for those endpoints,
74+
even though the MCLAG peering session is up.
75+
76+
#### How to diagnose this issue
77+
78+
Traffic destined for endpoints connected to the leaf is blackholed. All BGP sessions from the affected leaf towards
79+
the spines are down.
80+
81+
#### Known workarounds
82+
83+
None.
84+
85+
### Cannot set PortSpeed for PortChannels
86+
87+
TBD
88+
89+
#### How to diagnose this issue
90+
91+
TBD
92+
93+
#### Known workarounds
94+
95+
TBD

0 commit comments

Comments
 (0)