Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Broadcom-DNX][Nokia][202405] Receiving PFC storms on the MACSEC ports causes orchagent crash. #20859

Open
amitpawar12 opened this issue Nov 19, 2024 · 2 comments
Assignees
Labels
BRCM Chassis 🤖 Modular chassis support Issue for 202405 P0 Priority of the issue regression Triaged this issue has been triaged

Comments

@amitpawar12
Copy link

Description

This was seen on 202405 system, with Broadcom-DNX multi-asic chassis.

On receiving the PFC frames on the egress MACSEC enabled port, causes orch-agent to crash. This was tested on both 100 and 400Gbps ports and crash is seen on both of them. The issue was seen with IXIA acting as traffic transmitter + receiver and also generating PFC storms towards DUT egress port. However, this can be easily reproduced without IXIA as well.

Steps to reproduce the issue:

  1. Enabled PFCWD and credit-watchdog.
  2. Configure MACSEC on the port.
  3. Ensure that MACSEC protocol is UP and running.
  4. Start the traffic passing through the port. In this case, I was generating Priority 3+4 lossless and priority 0, 1 and 2 lossy traffic.
  5. Send PFC storm on the MACSEC port for lossless priorities.
  6. Check for the orch-agent status or tail the syslog.

Describe the results you received:

Following logs are seen on the system on start PFC storm:

2024 Nov 19 14:11:27.621822 ixre-egl-board72 NOTICE syncd0#syncd: :- inc: 10000 (calls 10000) Syncd::syncUpdateRedisQuadEvent op took: 17 ms
2024 Nov 19 14:11:41.157132 ixre-egl-board72 NOTICE swss0#orchagent: :- startWdActionOnQueue: Receive notification, storm
2024 Nov 19 14:11:41.157132 ixre-egl-board72 NOTICE swss0#orchagent: :- report_pfc_storm: PFC Watchdog detected PFC storm on port Ethernet0, queue index 4, queue id 0x15000000000292 and port id 0x1000000000001
2024 Nov 19 14:11:41.157172 ixre-egl-board72 NOTICE swss0#orchagent: :- publish: EVENT_PUBLISHED: {"sonic-events-swss:pfc-storm":{"ifname":"Ethernet0","port_id":"281474976710657","queue_id":"5910974510924434","queue_index":"4","timestamp":"2024-11-19T14:11:41.156948Z"}}
2024 Nov 19 14:11:41.158361 ixre-egl-board72 ERR syncd0#syncd: [06:00.0] SAI_API_PORT:brcm_sai_set_port_attribute:3744 epfc tx config get failed with error Feature not initialized (0xffffffef).
2024 Nov 19 14:11:41.158361 ixre-egl-board72 ERR syncd0#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_UNINITIALIZED
2024 Nov 19 14:11:41.158424 ixre-egl-board72 ERR syncd0#syncd: :- processQuadEvent: VID: oid:0x1000000000001 RID: oid:0x100000001
2024 Nov 19 14:11:41.158467 ixre-egl-board72 ERR syncd0#syncd: :- processQuadEvent: attr: SAI_PORT_ATTR_PRIORITY_FLOW_CONTROL: 8
2024 Nov 19 14:11:41.158745 ixre-egl-board72 ERR swss0#orchagent: :- set: set status: SAI_STATUS_UNINITIALIZED
2024 Nov 19 14:11:41.158789 ixre-egl-board72 ERR swss0#orchagent: :- setPortPfc: Failed to set PFC 0x8 to port id 0x1000000000001 (rc:-12)
2024 Nov 19 14:11:41.158789 ixre-egl-board72 ERR swss0#orchagent: :- handleSaiSetStatus: Encountered failure in set operation, exiting orchagent, SAI API: SAI_API_PORT, status: SAI_STATUS_UNINITIALIZED
2024 Nov 19 14:11:41.158834 ixre-egl-board72 NOTICE swss0#orchagent: :- notifySyncd: sending syncd: SYNCD_INVOKE_DUMP
2024 Nov 19 14:11:41.159201 ixre-egl-board72 NOTICE syncd0#syncd: :- processNotifySyncd: Invoking SAI failure dump
2024 Nov 19 14:11:41.172648 ixre-egl-board72 NOTICE swss0#orchagent: :- sai_redis_notify_syncd: invoked DUMP succeeded
2024 Nov 19 14:11:41.758970 ixre-egl-board72 INFO swss0#supervisord 2024-11-19 14:11:41,758 WARN exited: orchagent (terminated by SIGABRT (core dumped); not expected)
2024 Nov 19 14:11:42.764781 ixre-egl-board72 INFO swss0#supervisor-proc-exit-listener: Process 'orchagent' exited unexpectedly. Terminating supervisor 'swss'
2024 Nov 19 14:11:42.965078 ixre-egl-board72 NOTICE swss0#supervisor-proc-exit-listener: :- publish: EVENT_PUBLISHED: {"sonic-events-host:process-exited-unexpectedly":{"ctr_name":"swss","process_name":"orchagent","timestamp":"2024-11-19T14:11:42.964796Z"}}

Check for the orch-agent status:

admin@ixre-egl-board72:~$ ls -lrt /var/core/
total 1952
-rw-r--r-- 1 root root 1996682 Nov 19 14:11 orchagent.1732025501.55.0.core.gz

Describe the results you expected:

There should be no crash seen for orch-agent. The MACSEC port should process the PFC storm and drop the incoming lossless priority traffic without generating any PFCs towards the IXIA transmitter.

Output of show version:

SONiC Software Version: SONiC.HEAD.875994-202405-7ae36b71e
SONiC OS Version: 12
Distribution: Debian 12.8
Kernel: 6.1.0-22-2-amd64
Build commit: 7ae36b71e
Build date: Wed Nov 13 08:58:17 UTC 2024
Built by: gitlab-runner@sonic-bld2

Platform: x86_64-nokia_ixr7250e_36x400g-r0
HwSKU: Nokia-IXR7250E-36x100G
ASIC: broadcom
ASIC Count: 2
Serial Number: XXXXXXXXX
Model Number: XXXXXXXXX
Hardware Revision: 56
Uptime: 14:47:14 up 2 days, 16:39,  1 user,  load average: 1.06, 1.21, 1.34
Date: Tue 19 Nov 2024 14:47:14

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

@saksarav-nokia
Copy link
Contributor

In SAI 11.2, Looks like the feature BRCM_SAI_ENCRYPTED_PFC_SUPPORT is introduced and when the PFC setting is updated for a macsec enabled port, SAI tries to get the encrypted pfc tx settings and returns failure. So orchagent crashes. Created CSP created a csp CS00012378917

@prgeor prgeor added Chassis 🤖 Modular chassis support Triaged this issue has been triaged and removed Triaged this issue has been triaged labels Nov 20, 2024
@prgeor prgeor added the Triaged this issue has been triaged label Nov 20, 2024
@rlhui rlhui added regression BRCM P0 Priority of the issue labels Nov 20, 2024
@saksarav-nokia
Copy link
Contributor

There is soc variable "sai_encrypted_pfc_enable" to enable/disable the Encrypted PFC. By default this is disabled and we don't have this soc variable set. But the code in brcm_sai_port.c doesn't check this flag and tries to call SDK api to get the pfc tx config and fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BRCM Chassis 🤖 Modular chassis support Issue for 202405 P0 Priority of the issue regression Triaged this issue has been triaged
Projects
Status: No status
Development

No branches or pull requests

5 participants