Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARS POC #7

Open
wants to merge 247 commits into
base: master
Choose a base branch
from
Open

ARS POC #7

wants to merge 247 commits into from

Conversation

VladimirKuk
Copy link

Why I did it

Work item tracking
  • Microsoft ADO (number only):

How I did it

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

VladimirKuk and others added 3 commits October 28, 2024 12:25
shiraez pushed a commit that referenced this pull request Dec 11, 2024
…et#21095)

Adding the below fix from FRR FRRouting/frr#17297

This is to fix the following crash which is a statistical issue

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/zebra -A 127.0.0.1 -s 90000000 -M dplane_fpm_nl -M snmp'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fccd6faf7c0 (LWP 36))]
(gdb) bt
#0  0x00007fccd7351e2c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fccd7302fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fccd72ed472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fccd75bb3a9 in _zlog_assert_failed (xref=xref@entry=0x7fccd7652380 <_xref.16>, extra=extra@entry=0x0) at ../lib/zlog.c:678
#4  0x00007fccd759b2fe in route_node_delete (node=<optimized out>) at ../lib/table.c:352
#5  0x00007fccd759b445 in route_unlock_node (node=0x0) at ../lib/table.h:258
#6  route_next (node=<optimized out>) at ../lib/table.c:436
#7  route_next (node=node@entry=0x56029d89e560) at ../lib/table.c:410
#8  0x000056029b6b6b7a in if_lookup_by_name_per_ns (ns=ns@entry=0x56029d873d90, ifname=ifname@entry=0x7fccc0029340 "PortChannel1020")
    at ../zebra/interface.c:312
#9  0x000056029b6b8b36 in zebra_if_dplane_ifp_handling (ctx=0x7fccc0029310) at ../zebra/interface.c:1867
#10 zebra_if_dplane_result (ctx=0x7fccc0029310) at ../zebra/interface.c:2221
#11 0x000056029b7137a9 in rib_process_dplane_results (thread=<optimized out>) at ../zebra/zebra_rib.c:4810
#12 0x00007fccd75a0e0d in thread_call (thread=thread@entry=0x7ffe8e553cc0) at ../lib/thread.c:1990
sonic-net#13 0x00007fccd7559368 in frr_run (master=0x56029d65a040) at ../lib/libfrr.c:1198
sonic-net#14 0x000056029b6ac317 in main (argc=9, argv=0x7ffe8e5540d8) at ../zebra/main.c:478
VladimirKuk and others added 22 commits January 2, 2025 15:25
Signed-off-by: Vladimir Kuk <[email protected]>
Signed-off-by: Vladimir Kuk <[email protected]>
- Why I did it
After this pull request sonic-net#19190 , the pmon has been added to the start list in fast/warm reboot scenarios. However, certain non-critical daemons of pmon could be delayed, resulting in a saving of approximately 1 second in the reboot process. For performance considerations, especially as the current time usage of fast reboot is closer to 30 seconds limitation, this change could ease the pressure.

- How I did it
add a script as fast/warm reboot monitor and relative supervisord rlues.
once the script exited means the reboot process has ended, other delayed daemon would then initialize.

- How to verify it
check the fast/warm reboot time usage

Signed-off-by: Yuanzhe, Liu <[email protected]>
* [Micas/Platform]platform support M2-W6920-32QC2X

Signed-off-by: philo <[email protected]>

* update device files

Signed-off-by: philo <[email protected]>

* triggle rebuild

* rebuild

* rebuild

* rebuild

* triggle rebuild

* triggle rebuild

* triggle rebuild

---------

Signed-off-by: philo <[email protected]>
…onic-net#20726)

== Why I did it ==

Commit 06c469e added an extra redis instance. This resulted in a
two item string without linefeeds in /etc/supervisor/critical_processes:

  program:redisprogram:redis_bmp

That resulted in an error in syslog and docker-database failing.

== Work item tracking ==

ERR database#supervisor-proc-exit-listener: Syntax of the line
  program:redisprogram:redis_bmp#012 in processes file is incorrect.
  Exiting... (sonic-net#20636)

ossobv#17

== How I did it ==

Replace the jinja2 whitespace eating hyphens from BOL to EOL.

Note that j2 and the jinja2 parser in sonig-cfggen do not behave the
same. The sonig-cfggen is the relevant one.

Before:

  $ j2 ./dockers/docker-database/critical_processes.j2.old -f json \
      <<< '{"INSTANCES":{"foo":"bar","baz":"..."}}'
  |
  |
  program:foo
  program:baz

  # docker exec database sonic-cfggen \
      -j /var/run/redis/sonic-db/database_config.json \
      -t /usr/share/sonic/templates/critical_processes.j2.old
  program:redisprogram:redis_bmp

After:

  $ j2 ./dockers/docker-database/critical_processes.j2 -f json \
      <<< '{"INSTANCES":{"foo":"bar","baz":"..."}}'
  program:foo
  program:baz

  # docker exec database sonic-cfggen \
      -j /var/run/redis/sonic-db/database_config.json \
      -t /usr/share/sonic/templates/critical_processes.j2
  program:redis
  program:redis_bmp
  |

After this fix, the output in /etc/supervisor/critical_processes is
correct and the error from docker-database is gone.
Why I did it
Bugfix for Yang model of BGP Allowed Prefix.

Support optional NEIGHBOR_TYPE in key.
Support optional le and ge in prefixes_v4/prefixes_v6 list (e.g., 10.20.30.0/24 le 30).
Work item tracking
Microsoft ADO (number only): 30001113
How I did it
Updated sonic-bgp-allowed-prefix.yang.

Define optional value NEIGHBOR_TYPE in key.
Define type bgp-allowed-ipv4-prefix and bgp-allowed-ipv6-prefix to support the optional suffix in prefixes_v4/prefixes_v6 list.
How to verify it
Verified by UT:
…D automatically (sonic-net#20878)

#### Why I did it
src/sonic-platform-daemons
```
* b276e41 - (HEAD -> master, origin/master, origin/HEAD) [SmartSwitch] Extend implementation of the DPU chassis daemon. (sonic-net#563) (9 hours ago) [Oleksandr Ivantsiv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Update EZB files to version 1.09 to support SAI 1.14.0.2 for ac3x(armhf)
Update EZB files to version 1.09 to support SAI 1.14.0.3 for ac5x(arm64)
* Add stp submodule

* Changing the sonic-stp repo url
* [Marvell] Falcon 3.2T HwSku support

Signed-off-by: Rajkumar P R <[email protected]>
…ffic script (sonic-net#20635)

[SmartSwitch] Added inbound traffic capability for DPU management traffic script
Fix nvidia smartswitch build pipeline

Signed-off-by: Prabhat Aravind <[email protected]>
Why I did it
BMP instance should not be launched on DPU database.

Work item tracking
Microsoft ADO (number only):
How I did it
Added some condition check to avoid bmp instance from being instantiated on DPU database instances.

How to verify it
local verified on KVM NPU platform.
…SIS_APP_DB (sonic-net#20369)

Modify database.sh to create a initial SYSTEM_LAG_IDS_FREE_LIST in the CHASSIS_APP_DB on SUP during database-chassis startup
Modify the database consistency check in swss.sh to append the lagid to the end of SYSTEM_LAG_IDS_FREE_LIST when lagid is released.
Modify the lag_id_end=1023 (not 1024) in chassisdb.conf since BCM supports the large lagid is 1023

Signed-off-by: mlok <[email protected]>
Why I did it
Build bmp container into sonic-buildimage, and added relevant daemon/file handling.

Work item tracking
Microsoft ADO (number only):27588893
How I did it
Build bmp container into sonic-buildimage, and added relevant daemon/file handling.

How to verify it
Local build successfully and verified in lab DUT.
…ly (sonic-net#20892)

#### Why I did it
src/sonic-bmp
```
* a2d576b - (HEAD -> master, origin/master, origin/HEAD) Update README.md to add azure pipeline status link (12 hours ago) [Feng-msft]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…lly (sonic-net#20894)

#### Why I did it
src/sonic-swss
```
* eda63a9b - (HEAD -> master, origin/master, origin/HEAD) Vlanmgrd handling of portchannel does not exist more gracefully. (sonic-net#3367) (6 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…omatically (sonic-net#20854)

#### Why I did it
src/sonic-swss-common
```
* ebd2afb - (HEAD -> master, origin/master, origin/HEAD) Supports FRR-VRRP configuration (sonic-net#813) (25 hours ago) [Philo]
* fe30ccd - [DASH] Add DASH Meter Policy , Rule , Counter table definitions (sonic-net#949) (2 days ago) [Sundara Gurunathan]
* 901f3b4 - [common] enable redispipeline to only publish after flush (sonic-net#895) (3 days ago) [Yijiao Qin]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Run pipeline all mgmt tests with this SAI version on a T2 testbed.
Why I did it
Keep using syslog for bmp since no large output.

How I did it
Revert previous bmp log change

How to verify it
revert change, pending verification pass.
cscarpitta and others added 30 commits January 21, 2025 14:48
Adding FRR CLI to support SRv6 static. The HLD for the feature is available at sonic-net/SONiC#1860

Signed-off-by: Carmine Scarpitta <[email protected]>
Why I did it
To support the addition of two new tables in CONFIG_DB, i.e. SRV6_MY_SIDS and SRV6_MY_LOCATORS, in order to allow configuration for SRv6 in SONiC.

Work item tracking
Microsoft ADO (number only): 30513277

How I did it
I define the YANG model based on SRv6 HLD.

How to verify it
Run the unit tests and build image.
Why I did it
if critical process crashes or killed, bmp docker container will not be auto-restarted.

How I did it
/usr/bin/supervisor-proc-exit-listener takes in charge of critical process monitor and event publish, thus it should be autorestar-ted in any case, otherwise there might be issue if supervisor-proc-exit-listener crashes, or in some test cases like
"docker exec bmp kill -SIGKILL -1" critical processes may not work correctly in some race condition (depends on whether supervisor-proc-exit-listener is the last one to be killed)

When a container receives the SIGKILL signal to terminate its processes, the order in which the processes are actually terminated can depend on the scheduling and resource availability within the container.

If supervisor-proc-exit-listener is killed first before critical process, container auto restart will not be launched as expected.
…t#21366)

Why I did it
Use debian mirror snapshot instead of debian version pinning.
Because debian version pinning can't handle package uninstallation scenario.
…atically (sonic-net#21420)

#### Why I did it
src/sonic-snmpagent
```
* 9e2c50a - (HEAD -> master, origin/master, origin/HEAD) Fix snmp agent not-responding issue when high CPU utilization (sonic-net#345) (2 hours ago) [Jianquan Ye]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…tomatically (sonic-net#21416)

#### Why I did it
src/sonic-linux-kernel
```
* 416e7a4 - (HEAD -> master, origin/master, origin/HEAD) Fix optoe's write_max when using native i2c driver (sonic-net#407) (6 hours ago) [Prince George]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…lly (sonic-net#21422)

#### Why I did it
src/sonic-swss
```
* 4eb74f00 - (HEAD -> master, origin/master, origin/HEAD) [orchagent] Fix: ERR swss#orchagent: :- setPortPvid: pvid setting for tunnel Port_EVPN_XXX is not allowed (sonic-net#3402) (9 hours ago) [Brad House]
```
#### How I did it
#### How to verify it
#### Description for the changelog
sonic-net#21355)

Why I did it
It's one part of the fixes of sonic-net#21314
SNMP walker request will always timeout when 100% CPU utilization.

Work item tracking
Microsoft ADO 30112399:

How I did it
Enable SNMP dynamic frequency on packet chassis.

How to verify it
snmp/test_snmp_cpu.py(https://github.com/sonic-net/sonic-mgmt/blob/master/tests/snmp/test_snmp_cpu.py) tests the scenario.
Why I did it
After docker-syncd-brcm-dnx-rpc is moved to bookworm in master, the libthrift*.so is not installed inside the syncd docker and the syncd process fails to come up.

Work item tracking
Microsoft ADO (number only):
How I did it
Installed libthrift-0.17.0

How to verify it
Verified that the syncd dockers and swss dockers stay up and able to run Qos tests
Why I did it
Improve the t1 config to align with YANG validation

How I did it
Add missing leafref and mandatory field to the config

How to verify it
YANG validation check on generated config
libthrift did not get installed in the Broadcom syncd RPC container. However, syncd-rpc requires it.
…utomatically (sonic-net#21437)

#### Why I did it
src/sonic-host-services
```
* 0430ada - (HEAD -> master, origin/master, origin/HEAD) Add implementation for DockerService.List (sonic-net#199) (16 hours ago) [Dawei Huang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Add an additional platform to the SONiC support list

Work item tracking
Microsoft ADO (number only):
How I did it
Added necessary platform configurations and identification logic.
Some iterations are still necessary on those.

How to verify it
An image containing this PR and the necessary driver changes should end up with links up.

Which release branch to backport (provide reason below if selected)
 202411
 msft-202412
Description for the changelog
Add initial support for Moby platform
Why I did it
Fix front panel LEDs for Quicksilver
Fix fan LEDs for Quicksilver
Add Moby platform
Work item tracking
Microsoft ADO (number only):
How I did it
Updated Arista platform submodules
- Why I did it
Update SAI Version SAIBuild245.3..13

- How I did it
Upload SAI artifact and update mlnx-sai.mk file

- How to verify it
Run sonic-mgmt tests
…omatically (sonic-net#21449)

#### Why I did it
src/sonic-swss-common
```
* 5a4b4a5 - (HEAD -> master, origin/master, origin/HEAD) C API Exceptions (sonic-net#967) (2 hours ago) [erer1243]
* b58a501 - Add swss::Table to c api (sonic-net#964) (12 hours ago) [erer1243]
```
#### How I did it
#### How to verify it
#### Description for the changelog
…lly (sonic-net#21375)

#### Why I did it
src/sonic-gnmi
```
* b017531 - (HEAD -> master, origin/master, origin/HEAD) Fix roles checking if cname not exist (sonic-net#339) (20 hours ago) [ganglv]
* 3afc927 - Better modularization for gnoi_client.go (5 days ago) [Dawei Huang]
|\ 
| failure_prs.log skip_prs.log 6059b18 - finish package sonic module. (6 days ago) [Dawei Huang]
| failure_prs.log skip_prs.log 6867664 - finish packaging system module. (6 days ago) [Dawei Huang]
| failure_prs.log skip_prs.log 1f77986 - seperate util and system module. (6 days ago) [Dawei Huang]
| failure_prs.log skip_prs.log 35df113 - seperate flags into a package config. (6 days ago) [Dawei Huang]
| failure_prs.log skip_prs.log 1405cd2 - format and naming clean up for gnoi_client.go. (7 days ago) [Dawei Huang]
* aa547ad - Improve GNMI service to limit API access by role (sonic-net#335) (6 days ago) [ganglv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Fixed issue : sonic-net#20575

Why I did it
"config-reload" in dualtor topologies were failing due to absence of TC_TO_DSCP Yang model.
The above failure was seen after the the PR sonic-net/sonic-utilities#3102

How to verify it
Step-1: In DUT add the yang file to "/usr/local/yang-models/sonic-tc-dscp-map.yang" to this path.
Step-2: config reload -y

Tested branch (Please provide the tested image version)
 202411

Description for the changelog
Adding YANG model for TC_TO_DSCP_MAP along with test files.
- Why I did it
Add SAI_KEY_SPC5_LOSSY_SCHEDULING=1 and
SAI_DEFAULT_SWITCHING_MODE_STORE_FORWARD=1
to SN5640-simx sai.profile, to support those features in the SKU

- How I did it
Added parameters to file

- How to verify it
Deploy SKU and check SAI logs
…ers and DSCP mapping (sonic-net#21427)

- Why I did it
To update buffers for Mellanox-SN5600-C256S1 and Mellanox-SN5600-C224O8

- How I did it
Update proper files - qos.json.j2, buffer_defaults_objects.j2, buffers_defaults_t0/t1.j2

- How to verify it
Run test and check SDK dumps values
Why I did it
sonic-buildimage repo integrates submodule code into the image by creating submodule advance PRs. However, these PRs often fail, and as time passes or the number of submodule PRs increases, identifying which PR caused the test failure becomes increasingly challenging. To address this, we need a pipeline capable of quickly building a VS image based on specific submodule commit IDs and precisely running the failed tests. This approach will help us efficiently pinpoint the submodule PR responsible for the failure.

How I did it
Create a build VS and test pipeline, users could build VS image and run tests based on specific submodule name and id, branch, topology, test scripts, features.
… automatically (sonic-net#21460)

#### Why I did it
src/sonic-platform-common
```
* 75c320d - (HEAD -> master, origin/master, origin/HEAD) Change Virtium SSD which doesn't support SmartCMD, to use only smartctl (sonic-net#522) (4 hours ago) [Noa Or]
```
#### How I did it
#### How to verify it
#### Description for the changelog
)

Why I did it
This PR is a temporary change, once the rshim interface will be replaced this PR will not be required anymore

To mount the dbus socket in pmon container as systemctl command has to be executed to start/stop service from PMON container during admin state/ reboot command execution

dockers/docker-platform-monitor/Dockerfile.j2 - Addition of dbus package for mellanox specific platform in order to use dbus-send command
files/build_templates/docker_image_ctl.j2 - Mount socket, since we need to use the systemctl command to start/stop service from pmon container

How I did it
How to verify it
dbus-send commands in Pmon container can be performed in order to start / stop the [email protected] which is relevant for starting or stopping the rshim service
To have the latest DASH bmv2 pipeline and DASH libsai library for DPU KVM image.

Work item tracking
Microsoft ADO 30793749:
…e starting dash-engine (sonic-net#21452)

Why I did it
If the attached ports of dash-engine are not UP, the dash-engine will not be able to receive any packets.

The below is the log when starting up dash-engine without the attached ports being UP:

Calling target program-options parser
Adding interface eth1 as port 0
[09:18:14.056] [bmv2] [D] [thread 7] Adding interface eth1 as port 0
[09:18:14.102] [bmv2] [E] [thread 7] Add port operation failed
Adding interface eth2 as port 1
[09:18:14.102] [bmv2] [D] [thread 7] Adding interface eth2 as port 1
[09:18:14.150] [bmv2] [E] [thread 7] Add port operation failed

Work item tracking
Microsoft ADO 30887888:

How I did it
If the attached ports of dash-engine are not UP, ensure them to be UP before starting dash-engine

How to verify it
The dash-engine runs with ports added successfully.

Adding interface eth1 as port 0
[09:40:16.810] [bmv2] [D] [thread 11] Adding interface eth1 as port 0
Adding interface eth2 as port 1
[09:40:16.863] [bmv2] [D] [thread 11] Adding interface eth2 as port 1
Fix sonic-net#20284

In 202405 and above, two extra steps are added before the start of every container which checks NUM_DPU and IS_DPU_DEVICE by parsing the platform.json file using the jq tool. This is only relevant for Smartswitch. However, this is adding some delay during the reconciliation phase of WR/FR resulting

How I did it
Set the environment variables for systemd by systemd-sonic-generator.

Signed-off-by: Ze Gan <[email protected]>
…iple ptf nn agents connection (sonic-net#21070)

When testing sonic with ptf dataplane connecting multiple ptf nn agents, some cases will fail because of packets queue in ptf were not polled thoroughly. This is a bug or missing feature in ptf: p4lang/ptf#207
as a short term quick fix, this PR will patch the ptf-py3 package and unblock our qualification process
- Why I did it
To have the right sensors.conf file for SN5640 SIMX

- How I did it
Updated sensors.conf file under SN5640-SIMX platform folder

- How to verify it
run 'sensors' command on Mellanox SN5640 SIMX, and make sure no errors in output
flashrom recently started failing to build with the below error:
```
Cloning into 'flashrom-0.9.7'...
/sonic/src/flashrom/flashrom-0.9.7 /sonic/src/flashrom
fatal: 'tags/0.9.7' is not a commit and a branch 'flashrom-src' cannot be created from it
```
Nothing in sonic-buildimage has changed in relation to this so presumably
flashrom upstream renamed their tags.

This commit just fixes the formatting of the tag name to use the new format.

Signed-off-by: Brad House (@bradh352)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.