Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Proposal for Upkeep State Telemetry #302

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EasterTheBunny
Copy link
Contributor

@EasterTheBunny EasterTheBunny commented Jan 8, 2024

Relying on log output for debugging has been challenging at best and error prone at worst. This is frequently due to the fluctuating nature of full text log output as changes occur in the code base. On top of that, log collection at a full debug level for simple state telemetry is highly wasteful as most of the log output doesn't directly relate to state changes.

This proposal introduces distinct states of a specific upkeep that are indicated as being altered at specific points in the program. The approach replaces the default logger with a special telemetry logger that adds telemetry logging functions.

To view potential output of the telemetry logging, run the following in a local terminal:

$ make simulator && ./bin/simulator --simulate --verbose -f ./tools/simulator/plans/only_log_trigger.json

Then view ./simulation_plan_logs/simulation.log for "Status Delays per Work ID". The output attempts to indicated the time taken for an upkeep to reach specific states. This allows for evaluation of the time spent at each point in the process.

Relying on log output for debugging has been challenging at best and error prone at
worst. This is frequently due to the fluctuating nature of full text log output as
changes occur in the code base. On top of that, log collection at a full debug level
for simple state telemetry is highly wasteful as most of the log output doesn't
directly relate to state changes.

This proposal introduces distinct states of a specific upkeep that are indicated as
being altered at specific points in the program. The approach replaces the default
logger with a special telemetry logger that adds telemetry logging functions.

To view potential output of the telemetry logging, run the following in a local
terminal:

$ make simulator && ./bin/simulator --simulate --verbose -f ./tools/simulator/plans/only_log_trigger.json

Then view `./simulation_plan_logs/simulation.log` for "Status Delays per Work ID". The
output attempts to indicated the time taken for an upkeep to reach specific states.
This allows for evaluation of the time spent at each point in the process.
@amirylm
Copy link
Collaborator

amirylm commented Jan 9, 2024

The general idea of reporting the states of upkeeps is good, but we shouldn't create such coupling with the logger.
Why not starting with prometheus collectors, or at least follow their architecture where the collectors are independent, decoupled from the components that uses them.

We can have collectors (and states) exposed from a single package, use prometheus to collect the metrics and to query it from the simulator as part of the test.

cc @ogtownsend

@EasterTheBunny
Copy link
Contributor Author

The general idea of reporting the states of upkeeps is good, but we shouldn't create such coupling with the logger. Why not starting with prometheus collectors, or at least follow their architecture where the collectors are independent, decoupled from the components that uses them.

We can have collectors (and states) exposed from a single package, use prometheus to collect the metrics and to query it from the simulator as part of the test.

cc @ogtownsend

It does make sense to decouple from the logger and I assume the components that need the telemetry collector would require it in their constructors along with a logger?

My only issue with your proposal of using prometheus is the potential coupling with a 3rd party. As long as we structure the collector dependency in an abstract way that essentially wraps prometheus, I would be good with that. I think that is what you are saying in your last statement, but I just want to make sure.

@amirylm
Copy link
Collaborator

amirylm commented Jan 9, 2024

abstract way that essentially wraps prometheus

That is usually a common pattern, but we should follow guidelines we have in the core node which seems to be adding prometheus dependency (github.com/prometheus/client_golang/prometheus[/promauto])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants