You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Despite #415 / #445 , I am still seeing frequent crashes on the site UPS acquisition threads. I'm using container socs:v0.4.2-14-ge13cce1-dev, which includes the fixes from #445.
I think the failure is occurring when the UPS comes back online. When finally there is a successful call to snmp.get, I think it is not returning all the data that it used to. It would helpful to get better logging on block structure issues.
Here is last bit of the error log. Note that when it finally crashes, there is only 1 SNMP timeout message (whereas previously there were 3, between each "Trying to reconnect" message:
...
2023-06-05T13:13:24+0000 No SNMP response. Check your connection.
2023-06-05T13:13:25+0000 Trying to reconnect.
2023-06-05T13:13:34+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:13:40+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:13:47+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:13:47+0000 No SNMP response. Check your connection.
2023-06-05T13:13:48+0000 Trying to reconnect.
2023-06-05T13:13:56+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:02+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:08+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:08+0000 No SNMP response. Check your connection.
2023-06-05T13:14:09+0000 Trying to reconnect.
2023-06-05T13:14:16+0000 192.168.2.131 failure: [Failure instance: Traceback (failure with no frames): <class 'pysnmp.proto.errind.RequestTimedOut'>: No SNMP response received before timeout
]
2023-06-05T13:14:20+0000 acq:0 CRASH: [Failure instance: Traceback: <class 'Exception'>: Block structure does not match: ups
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:696:callback
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:798:_startRunCallbacks
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:892:_runCallbacks
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:1792:gotResult
--- <exception caught here> ---
/usr/local/lib/python3.8/dist-packages/twisted/internet/defer.py:1697:_inlineCallbacks
/usr/local/lib/python3.8/dist-packages/socs/agents/ups/agent.py:380:acq
/usr/local/lib/python3.8/dist-packages/ocs/ocs_agent.py:542:publish_to_feed
/usr/local/lib/python3.8/dist-packages/ocs/ocs_feed.py:224:publish_message
/usr/local/lib/python3.8/dist-packages/ocs/ocs_feed.py:36:append
]
2023-06-05T13:14:20+0000 acq:0 Status is now "done".
The text was updated successfully, but these errors were encountered:
Despite #415 / #445 , I am still seeing frequent crashes on the site UPS acquisition threads. I'm using container socs:v0.4.2-14-ge13cce1-dev, which includes the fixes from #445.
I think the failure is occurring when the UPS comes back online. When finally there is a successful call to snmp.get, I think it is not returning all the data that it used to. It would helpful to get better logging on block structure issues.
Here is last bit of the error log. Note that when it finally crashes, there is only 1 SNMP timeout message (whereas previously there were 3, between each "Trying to reconnect" message:
The text was updated successfully, but these errors were encountered: