UPS Agent: fully recover from bad connection issue #453

mhasself · 2023-06-05T20:26:18Z

Despite #415 / #445 , I am still seeing frequent crashes on the site UPS acquisition threads. I'm using container socs:v0.4.2-14-ge13cce1-dev, which includes the fixes from #445.

I think the failure is occurring when the UPS comes back online. When finally there is a successful call to snmp.get, I think it is not returning all the data that it used to. It would helpful to get better logging on block structure issues.

Here is last bit of the error log. Note that when it finally crashes, there is only 1 SNMP timeout message (whereas previously there were 3, between each "Trying to reconnect" message:

...
2023-06-05T13:13:24+0000 No SNMP response. Check your connection.
2023-06-05T13:13:25+0000 Trying to reconnect.
2023-06-05T13:13:34+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:13:40+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:13:47+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:13:47+0000 No SNMP response. Check your connection.
2023-06-05T13:13:48+0000 Trying to reconnect.
2023-06-05T13:13:56+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:02+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:08+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:08+0000 No SNMP response. Check your connection.
2023-06-05T13:14:09+0000 Trying to reconnect.
2023-06-05T13:14:16+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:20+0000 acq:0 CRASH: [Failure instance: Traceback: <class 'Exception'>: Block structure does not match: ups
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:696:callback
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:798:_startRunCallbacks
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:892:_runCallbacks
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:1792:gotResult
--- <exception caught here> ---
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:1697:_inlineCallbacks
/usr/local/lib/python3.8/dist-packages/socs/agents/ups/agent.py:380:acq
/usr/local/lib/python3.8/dist-packages/ocs/ocs_agent.py:542:publish_to_feed
/usr/local/lib/python3.8/dist-packages/ocs/ocs_feed.py:224:publish_message
/usr/local/lib/python3.8/dist-packages/ocs/ocs_feed.py:36:append
]
2023-06-05T13:14:20+0000 acq:0 Status is now "done".

The text was updated successfully, but these errors were encountered:

BrianJKoopman added the bug Something isn't working label Jun 5, 2023

BrianJKoopman assigned davidvng Jun 5, 2023

davidvng mentioned this issue Jun 12, 2023

Separate blocks for each input/output in UPS Agent #457

Merged

6 tasks

BrianJKoopman closed this as completed in #457 Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPS Agent: fully recover from bad connection issue #453

UPS Agent: fully recover from bad connection issue #453

mhasself commented Jun 5, 2023

UPS Agent: fully recover from bad connection issue #453

UPS Agent: fully recover from bad connection issue #453

Comments

mhasself commented Jun 5, 2023