-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SmartSwitch] Extend reboot script for rebooting SmartSwitch #3566
base: master
Are you sure you want to change the base?
Conversation
460146c
to
c72fbc0
Compare
8746356
to
d6fc624
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Additionally remove informational logs which are flooding with so much info.
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
REBOOT_SCRIPT_NAME=$(basename $0) | ||
REBOOT_TYPE="${REBOOT_SCRIPT_NAME}" | ||
TAG_LATEST=no | ||
REBOOT_FLAGS="" | ||
FORCE_REBOOT="no" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable
@@ -64,6 +73,9 @@ function stop_pmon_service() | |||
{ | |||
CONTAINER_STOP_RC=0 | |||
debug "Stopping pmon docker" | |||
if [[ "${PRE_SHUTDOWN}" == "yes" ]]; then | |||
systemctl disable pmon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this done so that we do not automatically restart PMON again? What about all the other non-critical dockers (other than gnmi)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this makes sure that pmon will not be restarted again. Other than pmon docker, there is no other service that is getting killed. Only syncd process is getting killed inside syncd container.
return ${EXIT_ERROR} | ||
fi | ||
|
||
if [[ "$REBOOT_TYPE" != $MODULE_REBOOT_SMARTSWITCH ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a check to see if reboot is being done on a DPU which is currently powered off and skipping reboot for that specific DPU?(Admin State=Down Operational State=Down)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in the case of MODULE_REBOOT_SMARTSWITCH
we still need to suppresss the pcied warnings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a check to see if reboot is being done on a DPU which is currently powered off and skipping reboot for that specific DPU?(Admin State=Down Operational State=Down)
We rely on platform API return code to determine this outcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even in the case of
MODULE_REBOOT_SMARTSWITCH
we still need to suppresss the pcied warnings
With the discussions that are in process, redis-cli requests will be done inside PCI attach/detach. Once that is finalized, I will modify it here.
is_smartswitch=$(python3 -c 'from utilities_common.chassis import is_smartswitch; print(is_smartswitch())') | ||
if [ "$is_smartswitch" == "True" ] && [ "$REBOOT_DPU" == "yes" ]; then | ||
exit $smart_switch_result | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think is_smartswitch() is already a helper function in reboot_smartswitch_helper
{ | ||
local dpu_ip=$1 | ||
local port=$2 | ||
$(docker exec gnmi gnoi_client -target "${dpu_ip}:${port}" -logtostderr -notls -module System -rpc RebootStatus &>/dev/null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some integration test here to prevent CLI for gnoi_client accidentally changes. I recommend writing our own python grpc client, example: sonic-net/sonic-mgmt#17068, but that can be a long shot.
What I did
Extended reboot script for SmartSwitch cases to reboot entire SmartSwitch or a specific DPU
How I did it
Implemented changes according to https://github.com/sonic-net/SONiC/blob/605c3a56ac2717dbbb638433e7bb13054fc05a31/doc/smart-switch/reboot/reboot-hld.md
How to verify it
Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)