Add additional health and performance metrics #266
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds a couple of new metrics to Broker that keep track of message and store command throughput as well as buffer slots. It also introduces a new metric factory to allow us to have all the strings (names and help texts) in a single file.
We've also talked about having metrics that estimate how many bytes are bound up in Broker messages. I've tried that and I couldn't find a good approach. We consider a message as buffered (received but not processed) when it is in any of the flow stages in the core. There are three stages we care about here: data inputs, command inputs and the central merge point. While messages travel through the pipeline, we convert them from data and command messages to node messages.
For data and command messages, there's no easy way of estimating the size of the payload other than applying a serializer to the messages. Even if that serializer doesn't actually write any data but only keeps track of how many bytes would be written: this would potentially add quite a bit of runtime overhead. Ideally, the instrumentation shouldn't add any measurable overhead to the system. If we still want that buffer size estimation in bytes, maybe we can benchmark that overhead in a real system (Zeek instance) to see whether the extra CPU load is negligible or not.
Relates #254.