Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent logging #270

Open
matejpavlovic opened this issue Oct 11, 2022 · 3 comments · May be fixed by #424
Open

Consistent logging #270

matejpavlovic opened this issue Oct 11, 2022 · 3 comments · May be fixed by #424
Assignees
Labels
backlog To address eventually, not necessarily now cleanup documentation Improvements or additions to documentation

Comments

@matejpavlovic
Copy link
Contributor

matejpavlovic commented Oct 11, 2022

Proposition

Logging

Each module that produces any log output should have a logger parameter. Important events during the module's operation should produce a log output. In the context of distributed protocols, the log levels have the following semantics:

FATAL

This should never happen and is the cause of immediate system halt. No further execution is meaningful when the error occurs. This log entry should provide the immediate reason for a crash.

ERROR

This should never happen. An error is logged if and only if the functionality of the system might be compromised and system might not provide the advertised guarantees. This should only be the case if there is a bug in the implementation or if some of the assumptions of the implemented protocol (e.g. on the upper bound of faulty nodes) have been violated.

WARNING

A warning indicates that something potentially suspicious, but technically legitimate is happening. When a warning is output, the distributed system as a whole is still fully functional, even tough parts of it might not be any more. A warning is the sign of something being sub-optimal, but accounted for. For example, a warning should be triggered by the reception of a malformed message (another node might be faulty), a buffer running out of space and dropping messages (network asynchrony or sub-optimal configuration), a node (local or remote) severely lagging behind (slow node), etc. If all the nodes are correct and operating smoothly (no significant slow-down) and the network does not drop or significantly delay messages, no warnings should be logged.

INFO

An event that indicates the system making progress in its operation. Info-level log events should be used sparingly, such that the stream of info-level output gives good insight of what the node is currently doing, but is still observable in real time by a human. For example, a block being delivered, the configuration having changed, or a checkpoint being created, all are typical info-level events.

DEBUG

Debug events provide insight into the internal workings of the system. The debug events can be produced at a higher rate than a human can perceive and are intended for recording and studying later. Interpreting a debug-level log entry can require deep knowledge of the details of the implementation.

TRACE

Very fine-grained log messages, providing a detailed insight about what code is being executed. Trace-level logging is not expected to occur all throughout the code, but should only be inserted when needed.

@matejpavlovic matejpavlovic added ADR Issue that should lead to an ADR documentation Improvements or additions to documentation and removed ADR Issue that should lead to an ADR labels Oct 11, 2022
@sergefdrv
Copy link
Contributor

Have you considered a Fatal level of logging with the semantics as in many widely used logging packages, e.g. log, zap, go-logger?

What is the principal difference between Debug and Trace levels and how to decide which one to use?

@abread
Copy link
Contributor

abread commented Oct 21, 2022

Have you considered standardizing on tracing rather than logging?

It's a godsend for debugging distributed systems! In the case of Mir, even if we only capture traces from a single node, it could prove very useful to debug interactions where non-go modules are involved

@matejpavlovic
Copy link
Contributor Author

Yes we can have Fatal too, good idea. I added it to the description, as well as a differentiation between Debug and Trace.

@abread , thanks for the pointer. The possibility of tracing is one of the main strengths of Mir (see #220 ). A rudimentary tool is alreaty there (the event interceptor and mircat) and is already very useful, but definitely can use an extension.

@matejpavlovic matejpavlovic added backlog To address eventually, not necessarily now cleanup and removed backlog To address eventually, not necessarily now labels Apr 5, 2023
@matejpavlovic matejpavlovic linked a pull request Apr 28, 2023 that will close this issue
@matejpavlovic matejpavlovic changed the title Consistent logging and handling of errors Consistent logging Apr 28, 2023
@matejpavlovic matejpavlovic added the backlog To address eventually, not necessarily now label May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog To address eventually, not necessarily now cleanup documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants