Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push hardware information to OpenJTS #39

Closed
nguyenduchoa37 opened this issue Jun 25, 2024 · 8 comments
Closed

Push hardware information to OpenJTS #39

nguyenduchoa37 opened this issue Jun 25, 2024 · 8 comments

Comments

@nguyenduchoa37
Copy link

Hi.

I install OpenJST to monitor MX960 with profile Heal Monitoring Profile. I make a test by shutting down FPC on MX960 (using the command request fpc slot offline). But I cannot see any alarm or warning on Grafara Web Gui (up to 4-5 minutes). Is there any way to detect fast the hardware error with OpenJTS?

@door7302
Copy link
Owner

Hello,

Could you please share the Junos version and the model of card.

David

@nguyenduchoa37
Copy link
Author

Hello,

Could you please share the Junos version and the model of card.

David

Hi.

I test with MX960 Junos: 20.4R3-S8.1, using MPC10E. But if this error appears on Grafana, which kind of this log ? And how many seconds this log will exist since that card is down on box?

@door7302
Copy link
Owner

I believe manually shutting down an MPC is not considered an error. If you want, I could provide you a command to simulate an HW error in your lab.

@nguyenduchoa37
Copy link
Author

nguyenduchoa37 commented Jun 27, 2024 via email

@door7302
Copy link
Owner

door7302 commented Jun 28, 2024

FOR LAB ONLY

1/ start shell pfe network fpcX.0 <<< X = slot number

2/ show cmerror module <<<< Identify the module ID for “Storage device” - in my case this is 5

3/ show cmerror module 5

Error-id PFE Level Threshold Count Occured Cleared Last-occurred(ms ago) Name
0x2c0002 0 Major 1 0 0 0 0 CPU_CMERROR_STORAGE_MSATA_DISABLED
0x2c0001 0 Minor 1 0 0 0 0 CPU_CMERROR_STORAGE_SMARTD_ERROR
0x2c0003 0 Minor 1 0 0 0 0 CPU_CMERROR_STORAGE_ACCESS_ERROR

Pick up the hexa ERROR-ID of a MAJOR error and its description and simulate the Error:

4/ test cmerror trigger-error 0x2c0002 0 CPU_CMERROR_STORAGE_MSATA_DISABLED 5

5/ exit

Now you should see a MAJOR ALARM

6/ regress@rtme-mx-25> show chassis alarms
3 alarms currently active
Alarm time Class Description
2024-06-28 06:30:54 PDT Major FPC 2 Major Errors

On openJTS you should see:

image

To clear the alarm you need to reboot

@door7302 door7302 closed this as completed Jul 8, 2024
@door7302 door7302 reopened this Jul 8, 2024
@nguyenduchoa37
Copy link
Author

nguyenduchoa37 commented Jul 8, 2024 via email

@door7302
Copy link
Owner

door7302 commented Aug 7, 2024

Any updates?

@door7302 door7302 closed this as completed Aug 7, 2024
@door7302 door7302 reopened this Aug 7, 2024
@nguyenduchoa37
Copy link
Author

nguyenduchoa37 commented Aug 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants