Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[poed]Merging new poed init flow and error handling rules #19

Merged
merged 13 commits into from
Feb 8, 2022

Conversation

leonchiang
Copy link
Contributor

@leonchiang leonchiang commented Dec 29, 2021

  1. Merging new poed init flow and error handling rules
  2. For poecli add config related commands
  3. Fix some I2C communication issues.
  4. Fix exclusive lock flow

leonchiang and others added 10 commits December 29, 2021 19:06
…vironment

1. Change service file permission (-x)
2. Add PYTHONUNBUFFERED=1 environment, for quicker dump stdout output.

Signed-off-by: leon.chiang <[email protected]>
1. Modify exclusive lock with 5 times retry(delay 0.1s), set lock flag if lock successful,
   then check lock flag to execute wrapped function.
2. Move some const define in poed to poe_common.py.

Signed-off-by: leon.chiang <[email protected]>
1. Add new sub command "poecli cfg" for manipulate config files.
2. Add busy flag between poed and poecli to prevent incorrect setting when the poed still initialization.
3. Change poecli command:
   3.1. save->savechip
   3.2. restore->restore_poe_system
4. Fix reconstruct flow if presistant config file loss.
5. Stop auto save presistent 30s loop.

Signed-off-by: leon.chiang <[email protected]>
1. Remove Enable All ports in init function,
2. Remove save_system_settings in platform init function.

Signed-off-by: leon.chiang <[email protected]>
1. Add I2C bus clear delay in TX/RX retry loop.
2. Add reset chip delay 300ms defines.
3. Print retry package content for debug.
4. Remove some space in lines (Refine code).
5. Add delay 30ms between command reports and next command(Key=0x00).

Signed-off-by: leon.chiang <[email protected]>
…o stderr

1.Fix poecli exit code passing issue.

    Ex:
        root@localhost:~# poecli set -p 0 -e 1 > /dev/null
        usage: poecli.py set [-h] -p <val> [-e <val>] [-l <val>] [-o <val>]
        poecli.py set: error: argument -p/--ports: Invalid port inputs: '0'.
        root@localhost:~# echo $?
        2
        root@localhost:~# poecli set -p 1 -e 1 > /dev/null
        root@localhost:~# echo $?
        0
2. Redirect all Exception print to stderr.

Signed-off-by: leon.chiang <[email protected]>
1. Change platform init flow in poe agent and all supported platform.
2. On startup, clear read buffer for resolve protocol KEY mismatch during initialization platform.
3. Fast compare current active matrix with platform default matrix, skip set/program active matrix if matched.
4. Add driver support read active matrix function(Parser/Return value).

Signed-off-by: leon.chiang <[email protected]>
Add failsafe mode when load/apply cfg setting fail, will disable all ports.

Signed-off-by: leon.chiang <[email protected]>
1. Add return code if set commands.
2. Add return code structure parsing in init_poe flow (all platform will return the detail results).
3. Modify ports setting checking rules (set_all_params).
4. Add traceback function to identify detail exceptions.

Signed-off-by: leon.chiang <[email protected]>
1. Fix communication retry echo byte regenerate.
2. Read 15 bytes when I2C bus initialized in all platform, avoid read mismatch package.
3. Suppress log message when exclusive lock success.
4. Skip powerlimit result checking in 802.3BT mode.

Signed-off-by: leon.chiang <[email protected]>
1. Skip read all port enDis state for BT type Chip (not support).
2. Print recv buffer when retry _communication.

Signed-off-by: leon.chiang <[email protected]>
@rothcar rothcar self-assigned this Dec 31, 2021
@rothcar rothcar self-requested a review December 31, 2021 01:03
@rothcar rothcar removed their assignment Dec 31, 2021
Copy link
Contributor

@rothcar rothcar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested this locally on an Accton BT-capable system and the results look good. We do need additional reviewers, though, specifically from Accton.

@akenliu
Copy link

akenliu commented Jan 3, 2022

There are many commits. Does it mean these commits can fix the issues#9~#18?

@leonchiang
Copy link
Contributor Author

There are many commits. Does it mean these commits can fix the issues#9~#18?

Yes, these commits include hotfixs of issues #9~#18 .

@WillyLiu-EC
Copy link
Contributor

Hi @leonchiang,

Some questions:
Q1. There a failed message as below after power on.
[ OK ] Started Update UTMP about System Runlevel Changes.

[ OK ] Stopped DentOS POE Agent.

[ OK ] Started DentOS POE Agent.

[ OK ] Stopped DentOS POE Agent.

[ OK ] Started DentOS POE Agent.

[ OK ] Stopped DentOS POE Agent.

[ OK ] Started DentOS POE Agent.

[ OK ] Stopped DentOS POE Agent.

[ OK ] Started DentOS POE Agent.

[ OK ] Stopped DentOS POE Agent.

[FAILED] Failed to start DentOS POE Agent.

See 'systemctl status poed.service' for details.

DENT OS DENTOS-HEAD, 2021-10-14.16:01-3d75b42

localhost login: root

Password:

Last login: Thu Nov 3 17:17:27 UTC 2016 on ttyS0

Linux localhost 5.10.4 #1 SMP PREEMPT Thu Oct 14 16:09:41 UTC 2021 aarch64

root@localhost:~# ^C

root@localhost:~# systemctl status poed.service > poed_service.log

root@localhost:~# cat poed_service.log

♂ poed.service - DentOS POE Agent

Loaded: loaded (/lib/systemd/system/poed.service; enabled; vendor preset: enabled)

Active: failed (Result: start-limit-hit) since Thu 2016-11-03 17:16:47 UTC; 9min ago

Process: 893 ExecStart=/usr/sbin/poed (code=exited, status=0/SUCCESS)

Main PID: 893 (code=exited, status=0/SUCCESS)

Nov 03 17:16:47 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.

Nov 03 17:16:47 localhost systemd[1]: Stopped DentOS POE Agent.

Nov 03 17:16:47 localhost systemd[1]: poed.service: Start request repeated too quickly.

Nov 03 17:16:47 localhost systemd[1]: Failed to start DentOS POE Agent.

Nov 03 17:16:47 localhost systemd[1]: poed.service: Unit entered failed state.

Nov 03 17:16:47 localhost systemd[1]: poed.service: Failed with result 'start-limit-hit'.

Q2: Can you provide user guide and test log for "poecli cfg" releated command. And then, we can test other DUT according to user guide from you

@leonchiang
Copy link
Contributor Author

leonchiang commented Jan 5, 2022

Hi @leonchiang,

Some questions: Q1. There a failed message as below after power on. [ OK ] Started Update UTMP about System Runlevel Changes.

[ OK ] Stopped DentOS POE Agent.

[ OK ] Started DentOS POE Agent.

[ OK ] Stopped DentOS POE Agent.

[ OK ] Started DentOS POE Agent.

[ OK ] Stopped DentOS POE Agent.

[ OK ] Started DentOS POE Agent.

[ OK ] Stopped DentOS POE Agent.

[ OK ] Started DentOS POE Agent.

[ OK ] Stopped DentOS POE Agent.

[FAILED] Failed to start DentOS POE Agent.

See 'systemctl status poed.service' for details.

DENT OS DENTOS-HEAD, 2021-10-14.16:01-3d75b42

localhost login: root

Password:

Last login: Thu Nov 3 17:17:27 UTC 2016 on ttyS0

Linux localhost 5.10.4 #1 SMP PREEMPT Thu Oct 14 16:09:41 UTC 2021 aarch64

root@localhost:~# ^C

root@localhost:~# systemctl status poed.service > poed_service.log

root@localhost:~# cat poed_service.log

male_sign poed.service - DentOS POE Agent

Loaded: loaded (/lib/systemd/system/poed.service; enabled; vendor preset: enabled)

Active: failed (Result: start-limit-hit) since Thu 2016-11-03 17:16:47 UTC; 9min ago

Process: 893 ExecStart=/usr/sbin/poed (code=exited, status=0/SUCCESS)

Main PID: 893 (code=exited, status=0/SUCCESS)

Nov 03 17:16:47 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.

Nov 03 17:16:47 localhost systemd[1]: Stopped DentOS POE Agent.

Nov 03 17:16:47 localhost systemd[1]: poed.service: Start request repeated too quickly.

Nov 03 17:16:47 localhost systemd[1]: Failed to start DentOS POE Agent.

Nov 03 17:16:47 localhost systemd[1]: poed.service: Unit entered failed state.

Nov 03 17:16:47 localhost systemd[1]: poed.service: Failed with result 'start-limit-hit'.

Q2: Can you provide user guide and test log for "poecli cfg" releated command. And then, we can test other DUT according to user guide from you
A1:
Can you provide your logs in "/var/log/syslog" and runtime config in "/run/poe_runtime_cfg.json" (or persistent config)for debug?
Or, you can use "journalctl -u poed" for more logs

root@localhost:~# stty cols 300 rows 45  # For more columns in console
root@localhost:~# journalctl -u poed
-- Logs begin at Mon 2022-01-03 07:20:09 UTC, end at Wed 2022-01-05 13:45:01 UTC. --
Jan 03 07:20:11 localhost systemd[1]: Started DentOS POE Agent.
Jan 03 07:20:12 localhost poed.py[822]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Jan 03 07:20:12 localhost poed[808]: Select 2-Pair mode
Jan 03 07:20:13 localhost poed[808]: Port map match, skip program global matrix
Jan 03 07:20:14 localhost poed.py[822]: INFO: init_poe all_result: 0
Jan 03 07:20:14 localhost poed.py[822]: INFO: Success to initialize platform PoE settings!
Jan 03 07:20:14 localhost poed.py[822]: INFO: Get POE Chip version: 22.2.1.1
Jan 03 07:20:20 localhost poed.py[822]: INFO: Success to restore port configurations from "/etc/poe_agent/poe_perm_cfg.json".
Jan 03 07:20:20 localhost poed.py[822]: INFO: Start autosave thread

A2: We do have help command in "poecli cfg -h":

root@localhost:~# poecli cfg -h
usage: poecli.py cfg [-h] [-s] [-l] [-c <val>]

optional arguments:
  -h, --help            show this help message and exit
  -s, --save            Save current runtime settings to persistent file.
  -l, --load            Load settings from persistent file.
  -c <val>, --config <val>
                        Assign file path for save/load operation,
                        instead of persistent config, Example:
                        poecli cfg -s -c [Config Path]

@WillyLiu-EC
Copy link
Contributor

Can you provide your logs in "/var/log/syslog" and runtime config in "/run/poe_runtime_cfg.json" (or persistent config)for debug?
Or, you can use "journalctl -u poed" for more logs

Hi Leo,
We use "journalctl -u poed" to check. It seems config loading problem
root@localhost:# journalctl -u poed > journalctl_poed.log
root@localhost:
# cat journalctl_poed.log
-- Logs begin at Wed 2080-10-30 05:44:24 UTC, end at Wed 2080-10-30 05:54:01 UTC. --
Oct 30 05:44:27 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:27 localhost poed.py[811]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:27 localhost poed.py[811]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:27 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:27 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:27 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:27 localhost poed.py[859]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:27 localhost poed.py[859]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:28 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:28 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:28 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:28 localhost poed.py[867]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:28 localhost poed.py[867]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:28 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:28 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:28 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:28 localhost poed.py[875]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:28 localhost poed.py[875]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:29 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:29 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:29 localhost poed.py[883]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:29 localhost poed.py[883]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:29 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:29 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Start request repeated too quickly.
Oct 30 05:44:29 localhost systemd[1]: Failed to start DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Unit entered failed state.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Failed with result 'start-limit-hit'.
root@localhost:~#
Oct 30 05:44:27 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:27 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:27 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:27 localhost poed.py[859]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:27 localhost poed.py[859]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:28 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:28 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:28 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:28 localhost dhclient[612]: DHCPDISCOVER on ma1 to 255.255.255.255 port 67 interval 6
Oct 30 05:44:28 localhost poed.py[867]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:28 localhost poed.py[867]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:28 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:28 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:28 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:28 localhost poed.py[875]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:28 localhost poed.py[875]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:29 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:29 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:29 localhost poed.py[883]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:29 localhost poed.py[883]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:29 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:29 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Start request repeated too quickly.
Oct 30 05:44:29 localhost systemd[1]: Failed to start DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Unit entered failed state.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Failed with result 'start-limit-hit'.

We never see "/run/poe_runtime_cfg.json". And My "poe_perm_cfg.json" content is empty.

Hi @leonchiang,
Some questions: Q1. There a failed message as below after power on. [ OK ] Started Update UTMP about System Runlevel Changes.
[ OK ] Stopped DentOS POE Agent.
[ OK ] Started DentOS POE Agent.
[ OK ] Stopped DentOS POE Agent.
[ OK ] Started DentOS POE Agent.
[ OK ] Stopped DentOS POE Agent.
[ OK ] Started DentOS POE Agent.
[ OK ] Stopped DentOS POE Agent.
[ OK ] Started DentOS POE Agent.
[ OK ] Stopped DentOS POE Agent.
[FAILED] Failed to start DentOS POE Agent.
See 'systemctl status poed.service' for details.
DENT OS DENTOS-HEAD, 2021-10-14.16:01-3d75b42
localhost login: root
Password:
Last login: Thu Nov 3 17:17:27 UTC 2016 on ttyS0
Linux localhost 5.10.4 #1 SMP PREEMPT Thu Oct 14 16:09:41 UTC 2021 aarch64
root@localhost:# ^C
root@localhost:
# systemctl status poed.service > poed_service.log
root@localhost:~# cat poed_service.log
male_sign poed.service - DentOS POE Agent
Loaded: loaded (/lib/systemd/system/poed.service; enabled; vendor preset: enabled)
Active: failed (Result: start-limit-hit) since Thu 2016-11-03 17:16:47 UTC; 9min ago
Process: 893 ExecStart=/usr/sbin/poed (code=exited, status=0/SUCCESS)
Main PID: 893 (code=exited, status=0/SUCCESS)
Nov 03 17:16:47 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Nov 03 17:16:47 localhost systemd[1]: Stopped DentOS POE Agent.
Nov 03 17:16:47 localhost systemd[1]: poed.service: Start request repeated too quickly.
Nov 03 17:16:47 localhost systemd[1]: Failed to start DentOS POE Agent.
Nov 03 17:16:47 localhost systemd[1]: poed.service: Unit entered failed state.
Nov 03 17:16:47 localhost systemd[1]: poed.service: Failed with result 'start-limit-hit'.
Q2: Can you provide user guide and test log for "poecli cfg" releated command. And then, we can test other DUT according to user guide from you
A1:
Can you provide your logs in "/var/log/syslog" and runtime config in "/run/poe_runtime_cfg.json" (or persistent config)for debug?
Or, you can use "journalctl -u poed" for more logs

root@localhost:~# stty cols 300 rows 45  # For more columns in console
root@localhost:~# journalctl -u poed
-- Logs begin at Mon 2022-01-03 07:20:09 UTC, end at Wed 2022-01-05 13:45:01 UTC. --
Jan 03 07:20:11 localhost systemd[1]: Started DentOS POE Agent.
Jan 03 07:20:12 localhost poed.py[822]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Jan 03 07:20:12 localhost poed[808]: Select 2-Pair mode
Jan 03 07:20:13 localhost poed[808]: Port map match, skip program global matrix
Jan 03 07:20:14 localhost poed.py[822]: INFO: init_poe all_result: 0
Jan 03 07:20:14 localhost poed.py[822]: INFO: Success to initialize platform PoE settings!
Jan 03 07:20:14 localhost poed.py[822]: INFO: Get POE Chip version: 22.2.1.1
Jan 03 07:20:20 localhost poed.py[822]: INFO: Success to restore port configurations from "/etc/poe_agent/poe_perm_cfg.json".
Jan 03 07:20:20 localhost poed.py[822]: INFO: Start autosave thread

A2: We do have help command in "poecli cfg -h":

root@localhost:~# poecli cfg -h
usage: poecli.py cfg [-h] [-s] [-l] [-c <val>]

optional arguments:
  -h, --help            show this help message and exit
  -s, --save            Save current runtime settings to persistent file.
  -l, --load            Load settings from persistent file.
  -c <val>, --config <val>
                        Assign file path for save/load operation,
                        instead of persistent config, Example:
                        poecli cfg -s -c [Config Path]

Can you provide your logs in "/var/log/syslog" and runtime config in "/run/poe_runtime_cfg.json" (or persistent config)for debug?
Or, you can use "journalctl -u poed" for more logs

Hi Leo,
We use "journalctl -u poed" to check. It seems config loading problem
root@localhost:# journalctl -u poed > journalctl_poed.log
root@localhost:
# cat journalctl_poed.log
-- Logs begin at Wed 2080-10-30 05:44:24 UTC, end at Wed 2080-10-30 05:54:01 UTC. --
Oct 30 05:44:27 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:27 localhost poed.py[811]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:27 localhost poed.py[811]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:27 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:27 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:27 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:27 localhost poed.py[859]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:27 localhost poed.py[859]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:28 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:28 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:28 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:28 localhost poed.py[867]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:28 localhost poed.py[867]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:28 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:28 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:28 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:28 localhost poed.py[875]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:28 localhost poed.py[875]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:29 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:29 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:29 localhost poed.py[883]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:29 localhost poed.py[883]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:29 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:29 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Start request repeated too quickly.
Oct 30 05:44:29 localhost systemd[1]: Failed to start DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Unit entered failed state.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Failed with result 'start-limit-hit'.
root@localhost:~#
Oct 30 05:44:27 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:27 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:27 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:27 localhost poed.py[859]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:27 localhost poed.py[859]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:28 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:28 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:28 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:28 localhost dhclient[612]: DHCPDISCOVER on ma1 to 255.255.255.255 port 67 interval 6
Oct 30 05:44:28 localhost poed.py[867]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:28 localhost poed.py[867]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:28 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:28 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:28 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:28 localhost poed.py[875]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:28 localhost poed.py[875]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:29 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:29 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: Started DentOS POE Agent.
Oct 30 05:44:29 localhost poed.py[883]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Oct 30 05:44:29 localhost poed.py[883]: WARN: Load config failed: 'NoneType' object is not subscriptable
Oct 30 05:44:29 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Oct 30 05:44:29 localhost systemd[1]: Stopped DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Start request repeated too quickly.
Oct 30 05:44:29 localhost systemd[1]: Failed to start DentOS POE Agent.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Unit entered failed state.
Oct 30 05:44:29 localhost systemd[1]: poed.service: Failed with result 'start-limit-hit'.

We never see "/run/poe_runtime_cfg.json". And My "poe_perm_cfg.json" content is empty.

@leonchiang
Copy link
Contributor Author

leonchiang commented Jan 6, 2022

We never see "/run/poe_runtime_cfg.json". And My "poe_perm_cfg.json" content is empty.

Hi @WillyLiu-EC :
By our design, normally the config file shouldn't be empty (but exists), this will causes exception of load config, and exit code for safety.
If you put an undefined or non-json file to this position (or the file corrupted), it will cause service can not start, you must delete/backup it and regenerate an valid config file by "poecli cfg -s" command for create persistent config:

1. systemctl stop poed (If service still started)

2. rm /etc/poe_agent/poe_perm_cfg.json

3. systemctl start poed

4. sleep 5 (Wait service up about 5s)

5. poecli cfg -s (save current chip configs)

6. ls -la /etc/poe_agent/poe_perm_cfg.json (Wait save step completed)

Additional commit: Modify exit function for better debugging and get correct return code:

Jan 06 17:01:21 localhost systemd[1]: Started DentOS POE Agent.
Jan 06 17:01:21 localhost poed.py[20772]: INFO: Configure PoE ports from "/etc/poe_agent/poe_perm_cfg.json"
Jan 06 17:01:21 localhost poed.py[20772]: WARN: Load config failed: 'NoneType' object is not subscriptable
Jan 06 17:01:21 localhost poed[20766]: exitcode=-2
Jan 06 17:01:21 localhost systemd[1]: poed.service: Main process exited, code=exited, status=254/n/a
Jan 06 17:01:21 localhost systemd[1]: poed.service: Unit entered failed state.
Jan 06 17:01:21 localhost systemd[1]: poed.service: Failed with result 'exit-code'.
Jan 06 17:01:22 localhost systemd[1]: poed.service: Service hold-off time over, scheduling restart.
Jan 06 17:01:22 localhost systemd[1]: Stopped DentOS POE Agent.
Jan 06 17:01:22 localhost systemd[1]: poed.service: Start request repeated too quickly.
Jan 06 17:01:22 localhost systemd[1]: Failed to start DentOS POE Agent.
Jan 06 17:01:22 localhost systemd[1]: poed.service: Unit entered failed state.
Jan 06 17:01:22 localhost systemd[1]: poed.service: Failed with result 'exit-code'.

root@localhost:~# systemctl show poed --property=ExecMainStatus
ExecMainStatus=254

@leonchiang leonchiang force-pushed the main branch 2 times, most recently from 1d4ddfb to f24abd7 Compare January 6, 2022 10:33
1. Add exitcode printing for reason of exit
2. Change exit function usage.

Signed-off-by: leon.chiang <[email protected]>
@WillyLiu-EC
Copy link
Contributor

We never see "/run/poe_runtime_cfg.json". And My "poe_perm_cfg.json" content is empty.

Hi @WillyLiu-EC : By our design, normally the config file shouldn't be empty (but exists), this will causes exception of load config, and exit code for safety. If you put an undefined or non-json file to this position (or the file corrupted), it will cause service can not start, you must delete/backup it and regenerate an valid config file by "poecli cfg -s" command for create persistent config:

1. systemctl stop poed (If service still started)

2. rm /etc/poe_agent/poe_perm_cfg.json

3. systemctl start poed

4. sleep 5 (Wait service up about 5s)

5. poecli cfg -s (save current chip configs)

6. ls -la /etc/poe_agent/poe_perm_cfg.json (Wait save step completed)

Additional commit: Modify exit function for better debugging and get correct return code:

Hi @leonchiang

We follow these commands and cat /etc/poe_agent/poe_perm_cfg.json. It is seems save completed.
And then, there no failed info after we power cycle DUT. We think it work.

About this poe_perm_cfg.json, do we need commit it ? where should we put the json file ?

Best regards,
WillyLiu

@leonchiang
Copy link
Contributor Author

Hi @leonchiang

We follow these commands and cat /etc/poe_agent/poe_perm_cfg.json. It is seems save completed. And then, there no failed info after we power cycle DUT. We think it work.

About this poe_perm_cfg.json, do we need commit it ? where should we put the json file ?

Best regards, WillyLiu

Hi @WillyLiu-EC
Do you mean you want to put a default "/etc/poe_agent/poe_perm_cfg.json" file to onie Image?
Currently the code struct won't need an default /etc/poe_agent/poe_perm_cfg.json, if user need
they can create it by poecli cfg -s command.
If you need the "Platform Default", you can modify poe_platform.py in your platform directory and
change the default "enDis/priority/powerLimit...etc" values, by edit the init_poe function.

@WillyLiu-EC
Copy link
Contributor

Hi @WillyLiu-EC
Do you mean you want to put a default "/etc/poe_agent/poe_perm_cfg.json" file to onie Image?
Currently the code struct won't need an default /etc/poe_agent/poe_perm_cfg.json, if user need
they can create it by poecli cfg -s command.
If you need the "Platform Default", you can modify poe_platform.py in your platform directory and
change the default "enDis/priority/powerLimit...etc" values, by edit the init_poe function.

hi @leonchiang
As you say that, the poe_perm_cfg.json file needs to create it by poecli cfg -s command for currently the code struct.
But it occurs failed message when first boot(as we discussion before).

How do the users know that need to create the poe_perm_cfg.json ? are there any user guide ?
In your mind, shouldn't we need put a default "/etc/poe_agent/poe_perm_cfg.json" file to onie Image?

@leonchiang
Copy link
Contributor Author

leonchiang commented Jan 10, 2022

Hi @WillyLiu-EC
Do you mean you want to put a default "/etc/poe_agent/poe_perm_cfg.json" file to onie Image?
Currently the code struct won't need an default /etc/poe_agent/poe_perm_cfg.json, if user need
they can create it by poecli cfg -s command.
If you need the "Platform Default", you can modify poe_platform.py in your platform directory and
change the default "enDis/priority/powerLimit...etc" values, by edit the init_poe function.

hi @leonchiang As you say that, the poe_perm_cfg.json file needs to create it by poecli cfg -s command for currently the code struct. But it occurs failed message when first boot(as we discussion before).

How do the users know that need to create the poe_perm_cfg.json ? are there any user guide ? In your mind, shouldn't we need put a default "/etc/poe_agent/poe_perm_cfg.json" file to onie Image?

Hi @WillyLiu-EC:

  1. The reason of failed message is an invalid config file in that path, poe agent couldn't identify the file format, so it prevent it start itself and throw return code (-2) to shell, we would like to know why the empty /etc/poe_agent/poe_perm_cfg.json config file exists in your system, or it's just a remain file in your system last test?

  2. The config file contain platform system info and port settings, set/save timestamp and port settings, between different platforms these setting may incompatible, ex: Port number/Power Limit.

  3. In this comment the autosave flow changed, currently the "/etc/poe_agent/poe_perm_cfg.json" setting won't create by normal, it need to created by user command "poecli cfg -s", and will to avoid any user typo in it (Include timestamp error and version number mismatch), or the config rule will a little complex for normal user.

  4. By doing factory reset (poecli restore_poe_system, suggest use in production flow), the platform default setting will write to poe chip's NV-MEMORY, and then preserve through cold boot (power cycle), then the poe agent started and if /etc/poe_agent/poe_perm_cfg.json file exists, it will load to poe chip to change the "runtime" state, but NV-MEMORY won't changed, until user use poecli savechip to save current state into NV-MEMORY.
    The storage hierarchy is: POE NV-MEMORY (Powered up)-> /etc/poe_agent/poe_perm_cfg.json (Booting into onie system) -> /run/poe_runtime_cfg.json (Current POE Chip's state snapshot)

Normal production flow example:

  1. Use poecli restore_poe_system to restore factory setting.
  2. Cold boot (power cycle) after step 1 done (optional, or restart poed agent if need warm boot)
  3. Boot into OS, use poecli cfg -s to save current state to /etc/poe_agent/poe_perm_cfg.json, or set custom ports setting first then save poe_perm_cfg.json.
  4. Use poecli savechip to save default custom settings to poe chip's NV-MEMORY. (optional)
  5. Done

You can create your production flow or boot script depend on your requirements by combine poecli sub commands.

@WillyLiu-EC
Copy link
Contributor

Hi @WillyLiu-EC
Do you mean you want to put a default "/etc/poe_agent/poe_perm_cfg.json" file to onie Image?
Currently the code struct won't need an default /etc/poe_agent/poe_perm_cfg.json, if user need
they can create it by poecli cfg -s command.
If you need the "Platform Default", you can modify poe_platform.py in your platform directory and
change the default "enDis/priority/powerLimit...etc" values, by edit the init_poe function.

hi @leonchiang As you say that, the poe_perm_cfg.json file needs to create it by poecli cfg -s command for currently the code struct. But it occurs failed message when first boot(as we discussion before).
How do the users know that need to create the poe_perm_cfg.json ? are there any user guide ? In your mind, shouldn't we need put a default "/etc/poe_agent/poe_perm_cfg.json" file to onie Image?

Hi @WillyLiu-EC:

  1. The reason of failed message is an invalid config file in that path, poe agent couldn't identify the file format, so it prevent it start itself and throw return code (-2) to shell, we would like to know why the empty /etc/poe_agent/poe_perm_cfg.json config file exists in your system, or it's just a remain file in your system last test?
  2. The config file contain platform system info and port settings, set/save timestamp and port settings, between different platforms these setting may incompatible, ex: Port number/Power Limit.
  3. In this comment the autosave flow changed, currently the "/etc/poe_agent/poe_perm_cfg.json" setting won't create by normal, it need to created by user command "poecli cfg -s", and will to avoid any user typo in it (Include timestamp error and version number mismatch), or the config rule will a little complex for normal user.
  4. By doing factory reset (poecli restore_poe_system, suggest use in production flow), the platform default setting will write to poe chip's NV-MEMORY, and then preserve through cold boot (power cycle), then the poe agent started and if /etc/poe_agent/poe_perm_cfg.json file exists, it will load to poe chip to change the "runtime" state, but NV-MEMORY won't changed, until user use poecli savechip to save current state into NV-MEMORY.
    The storage hierarchy is: POE NV-MEMORY (Powered up)-> /etc/poe_agent/poe_perm_cfg.json (Booting into onie system) -> /run/poe_runtime_cfg.json (Current POE Chip's state snapshot)

Normal production flow example:

  1. Use poecli restore_poe_system to restore factory setting.
  2. Cold boot (power cycle) after step 1 done (optional, or restart poed agent if need warm boot)
  3. Boot into OS, use poecli cfg -s to save current state to /etc/poe_agent/poe_perm_cfg.json, or set custom ports setting first then save poe_perm_cfg.json.
  4. Use poecli savechip to save default custom settings to poe chip's NV-MEMORY. (optional)
  5. Done

You can create your production flow or boot script depend on your requirements by combine poecli sub commands.

Hi @leonchiang
We didn't do any special command and action, just installed the .deb file. And then, we met the failed message.

We both know that the procedure of "Normal production flow example:" you mention to fixed the failed message.
The non-AMZ and other users meet the failed message if they use these codebase. How do they know the procedure from you ?

Best regard,
WillyLiu

@leonchiang leonchiang requested a review from rothcar January 12, 2022 10:34
@leonchiang
Copy link
Contributor Author

leonchiang commented Jan 12, 2022

Hi @leonchiang We didn't do any special command and action, just installed the .deb file. And then, we met the failed message.

We both know that the procedure of "Normal production flow example:" you mention to fixed the failed message. The non-AMZ and other users meet the failed message if they use these codebase. How do they know the procedure from you ?

Best regard, WillyLiu
Hi @WillyLiu-EC:

We can add a in-code basic user guide and troubleshooting document for Administrator/Operator to control POE function if need, and create sub-command like "poecli guide" for printing out the user guide in console. About troubleshooting guide (same document), we can add some resolvable case to this part for non-AMZ users reference.

Best Regards.

Add userguide in code for Administrator/Operator to control POE function
Usage:
    ~# poecli guide

Signed-off-by: leon.chiang <[email protected]>
@WillyLiu-EC
Copy link
Contributor

Hi @leonchiang We didn't do any special command and action, just installed the .deb file. And then, we met the failed message.
We both know that the procedure of "Normal production flow example:" you mention to fixed the failed message. The non-AMZ and other users meet the failed message if they use these codebase. How do they know the procedure from you ?
Best regard, WillyLiu
Hi @WillyLiu-EC:

We can add a in-code basic user guide and troubleshooting document for Administrator/Operator to control POE function if need, and create sub-command like "poecli guide" for printing out the user guide in console. About troubleshooting guide (same document), we can add some resolvable case to this part for non-AMZ users reference.

Best Regards.

Hi @leonchiang
OK.

Best regards,
WillyLiu

@leonchiang
Copy link
Contributor Author

Hi @rothcar:
We've add a userguide in recent commit 07e3a60 (by @WillyLiu-EC 's request), please help us to review.
Also we change some help text for more accuracy description (poecli show -j displays json format to stdout, not to a file).

Best Regards.
Leon

Copy link

@akenliu akenliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the modified codes and user guide are fine for Accton/Edge-core.

Copy link
Contributor

@rothcar rothcar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@rothcar rothcar merged commit 075be47 into dentproject:main Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants