Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPS Agent: fully recover from bad connection issue #453

Closed
mhasself opened this issue Jun 5, 2023 · 0 comments · Fixed by #457
Closed

UPS Agent: fully recover from bad connection issue #453

mhasself opened this issue Jun 5, 2023 · 0 comments · Fixed by #457
Assignees
Labels
bug Something isn't working

Comments

@mhasself
Copy link
Member

mhasself commented Jun 5, 2023

Despite #415 / #445 , I am still seeing frequent crashes on the site UPS acquisition threads. I'm using container socs:v0.4.2-14-ge13cce1-dev, which includes the fixes from #445.

I think the failure is occurring when the UPS comes back online. When finally there is a successful call to snmp.get, I think it is not returning all the data that it used to. It would helpful to get better logging on block structure issues.

Here is last bit of the error log. Note that when it finally crashes, there is only 1 SNMP timeout message (whereas previously there were 3, between each "Trying to reconnect" message:

...
2023-06-05T13:13:24+0000 No SNMP response. Check your connection.
2023-06-05T13:13:25+0000 Trying to reconnect.
2023-06-05T13:13:34+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:13:40+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:13:47+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:13:47+0000 No SNMP response. Check your connection.
2023-06-05T13:13:48+0000 Trying to reconnect.
2023-06-05T13:13:56+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:02+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:08+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:08+0000 No SNMP response. Check your connection.
2023-06-05T13:14:09+0000 Trying to reconnect.
2023-06-05T13:14:16+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:20+0000 acq:0 CRASH: [Failure instance: Traceback: <class 'Exception'>: Block structure does not match: ups
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:696:callback
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:798:_startRunCallbacks
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:892:_runCallbacks
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:1792:gotResult
--- <exception caught here> ---
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:1697:_inlineCallbacks
/usr/local/lib/python3.8/dist-packages/socs/agents/ups/agent.py:380:acq
/usr/local/lib/python3.8/dist-packages/ocs/ocs_agent.py:542:publish_to_feed
/usr/local/lib/python3.8/dist-packages/ocs/ocs_feed.py:224:publish_message
/usr/local/lib/python3.8/dist-packages/ocs/ocs_feed.py:36:append
]
2023-06-05T13:14:20+0000 acq:0 Status is now "done".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants