-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancing Prometheus metrics for BW consumption #1077
Comments
Currently, Prometheus collects the sent and received BW, however, they lack two major labels: Channel id and the reactor. So we need to integrate these two. |
In the existing implementation, there is a single reactor associated with each channel ID. Therefore, we can deduce the reactors based solely on the channel ID label. Given this, our approach will involve appending only the channel ID label to each metric. Lines 633 to 637 in 2f93fc8
Line 690 in 2f93fc8
Lines 153 to 158 in 2f93fc8
|
Is this guaranteed for future modifications to CometBFT? Is there a downside to including two labels? |
Thank you for your suggestion. Our current implementation, which tracks channel ID, adequately addresses the requirements of this issue. Adding an extra label could introduce redundancy and complexity without a significant improvement in observability. While enhancing observability by identifying reactors is possible, the current approach was opted to maintain simplicity and efficiency while meeting the objectives of this issue. |
apparently, no two reactors can share the same channel ID as suggested in Line 166 in 3f3b7cc
|
if I recall correctly, I've been able to determine the bandwidth used before via channel ID, but would have to dig through the metrics. Will hopefully be able to do that later today depending on if I can get the network tests working again |
[Reiterating the findings from our discussion in #1078], it has been clarified that the metrics in question are |
…ssageReceiveBytesTotal Prometheus metrics (#1086) Inline with #1077 Our past experimentation showed that none of the current Prometheus traffic related metrics encompass all the information regarding the message type, peer ID and channel ID. This deficiency can be addressed by incorporating peer IDs for `message_receive_bytes_total` and `message_send_bytes_total`. This PR provides this feature. I tested it by executing a local validator node and examining the Prometheus metrics endpoint. Here's an example of the output: ``` cometbft_p2p_message_send_bytes_total{chID="0x40", chain_id="mocha-4", message_type="blockchain_StatusResponse", peer_id="cb7adca6fbaabc5336c5eebbc1312390a2bb9d2d", version="1.0.0-rc0-278-g4c452c5f4"} 8 ```
Closes #1077 I have verified the outcome of this pull request, and the block time gauge now correctly appears in the Prometheus endpoint: ``` # HELP cometbft_consensus_block_time_seconds Duration between this block and the preceding one. # TYPE cometbft_consensus_block_time_seconds gauge cometbft_consensus_block_time_seconds{chain_id="mocha-4",version="1.0.0-rc16"} 11.623671263 ```
closed by #1091 |
As part of celestiaorg/celestia-app#2197, in order to be able to calculate the breakdown of BW consumption per chain ID and reactor, we would like to use to the existing Prometheus metrics in cekestia-core. However, it is not clear whether the current metrics are sufficient and whether we may require additional metrics to accomplish the task.
This issue intends to track the progress of this effort.
The text was updated successfully, but these errors were encountered: