You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 21, 2022. It is now read-only.
The bridge healthcheck logic at https://github.com/cloudfoundry-incubator/cf-abacus/blob/master/lib/utils/bridge/src/healthchecker.js sets isFailing any time a failure event is received, but that state will only ever get changed by a subsequent success event. If a failure has occurred and then a success event isn't received for a lengthy period (for example, no apps have been stopped or started) then the healtcheck will stay in a failed state permanently until the bridge is restarted.
It would make more sense to reset isFailing after the threshold has expired.
The text was updated successfully, but these errors were encountered:
Hello @amhuber,
Looking at the code I can tell that a failure can occurs only when 'usage.failure' is emitted. This happens when a particular Event from CloudController it not accepted by Abacus. In this case the bridge is retrying this same Event, no other events form Cloud Controller are taken into account. The healthcheck is staying in a failed state, it will turn into healthy state when the Event is successfully accepted. And then the bridge will read other events (if have any) from Cloud Controller.
The code has been refactored since the time of creation the issue. Can you please describe how did you reproduce your scenario.
Where this is an issue is in environments where we don't have any services in CF. If there is an issue with the CC then a failure event can be triggered in the abacus-services-bridge, but since there are no services there will never be an onSuccess event, so the bridge healthcheck reports as failed forever until the bridge is restarted. The only resolution on our end is to just not monitor the abacus-services-bridge healthcheck in environments that don't have any services, but it still seems like the logic could be improved in the healthcheck.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The bridge healthcheck logic at https://github.com/cloudfoundry-incubator/cf-abacus/blob/master/lib/utils/bridge/src/healthchecker.js sets isFailing any time a failure event is received, but that state will only ever get changed by a subsequent success event. If a failure has occurred and then a success event isn't received for a lengthy period (for example, no apps have been stopped or started) then the healtcheck will stay in a failed state permanently until the bridge is restarted.
It would make more sense to reset isFailing after the threshold has expired.
The text was updated successfully, but these errors were encountered: