-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle ASIC/SDK health event #1533
Conversation
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
doc/handle-ASIC-SDK-health-event/handle-ASIC-SDK-health-event.md
Outdated
Show resolved
Hide resolved
doc/handle-ASIC-SDK-health-event/handle-ASIC-SDK-health-event.md
Outdated
Show resolved
Hide resolved
doc/handle-ASIC-SDK-health-event/handle-ASIC-SDK-health-event.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Stephen Sun <[email protected]>
Signed-off-by: Stephen Sun <[email protected]>
Community review recording https://zoom.us/rec/share/bkb8Ed5drPXoyM1Yl_7-B6obBYYqlyyoY-AEF4BnErHCGrIDCSpJLhp-Bwbs6AJE.N_n2CjI_Pvr7Y5_o |
If anyone want to be reviewer of this feature, plesae leave your comments here. Thanks. |
Some comments from the HLD review meeting
|
Signed-off-by: Stephen Sun <[email protected]>
Provided a way to limit the number of events in the database but won't limit the rate at which the vendor SAI generates the events (which is vendor SAI's responsibility)
Won't address it for now.
We already did but still need to keep the
Vendor SAI should expose the capability only when it is supported completely. |
Signed-off-by: Stephen Sun <[email protected]>
Can you please help to add the code PRs by referring to #806 ? @stephenxs |
@venkatmahalingam /@prvattem , please signoff |
Done. |
@venkatmahalingam /@prvattem kindly reminder. if not further comments i will just go and merge the HLD. |
ASIC/SDK health event
A way for syncd to notify orchagent an ASIC/SDK health event before asking orchagent to shutdown is introduced in this document.
For most of ethernet switches, the switch ASIC is the core component in the system. It is very important to identify a switch ASIC is in a failure state and report such event to NOS.
Currently, such failure is detected by SDK/FW on most of platforms. A vendor SAI notifies orchagent to shutdown using
switch_shutdown_request
notification when it detects an ASIC/SDK internal error. Usually, the vendor SAI prints log message before calling shutdown API.Orchagent can abort itself if a SAI API call fails, usually due to a bad arguments, and can not be recovered. From a customer's perspective of view, this can be distinguished from the ASIC/SDK health event only by analyzing the log message.
The current implementation has the following limitations:
In this design, we will introduce a new way to address the limitations.
Implementation PRs
Signed-off-by: Stephen Sun [email protected]