-
Notifications
You must be signed in to change notification settings - Fork 382
Guidelines for choosing logging levels
-
logger.critical
is used to report conditions that may be unrecoverable. It complete library/app failure is likely, thenlogger.critical
should be used. -
logger.error
is used to report things that need the immediate attention of the app or library developers. If something doesn't warrent a new bug in some tracking system, it probably shouldn't be reported withlogger.error
-
logger.warning
is used to report things that are outside the bounds of "normal operation", but not problematic enough to warrant opening a new bug. -
logger.info
is used to report significant events in the flow of the library. We wantlogger.info
to show what the app is doing, but not with so much detail that the meaning is lost. -
logger.debug
is used to report details on how the app is doing things.
Of these, the line between logger.info
and logger.debug
is the most vague. To decide between these two, use the following guidelines:
- If in doubt, use
logger.debug
. We don't want to floodlogger.info
with too much detail. - If you're logging to prove that your algorithm is flowing corectly, use
logger.debug
. - If something calls into the library, either from above or below,
logger.info
might be appropriate. (unless the calls are really noisy) - If something external returns an error,
logger.info
might be appropriate. (unless we expect frequent errors in this condition and we recover) - If internal logic determines that an error occured, we report it using
logger.info
- If the flow of the program does something unexpected or out of the ordinary,
logger.info
might be appropriate. - If we get more than 1 or 2
logger.info
calls per second, there are probably too manylogger.info
calls.
Some customers will redirect logger calls to a web service, such that each call to logger.foo
will result in an HTTP call pushing the output string over the network. This is why we want to be conservative with calls to any level above logger.debug
.
If you think about viewing the log as a tech on the customer side:
logger.critical
requires immediate action. Either a phone-call to the person on call or an emergency reboot.
logger.error
gets reported and handled as soon as possible.
logger.warning
may or may not even be noticed.
logger.info
is used when someone at the customer site wants to figure out what's going on.
logger.debug
is used when one of the library developers (or someone else with deep understanding) wants to figure out what's going on.
We want to avoid logging the same condition multiple times. This typically happens with errors that get returned, where a function logs an error, then returns the error, and the calling function logs it before returning, and so on.
To avoid this, we use the following convention:
- If we get an error from an external API, we log it and
info
level. We use `info because it is somewhat important to note that an external component failed, but it is not yet an error that we need to report to the caller. - If we get an error from an internal function, we don't log it. We assume that the error was logged when it actually happened or when it was returned to us from external code.
- If we call
handle_background_exception
, we don't need to log becausehandle_background_exception
will log it for us.
(very rare)
Something bad has happened and we can no longer guarantee that things are going to work.
Examples
- threading.Thread() call raises an Exception
(rare)
Something bad has happened, and we either couldn't report it to you or it's so bad that you need to investigate it.
Examples:
- Unexpected state
- operation still not handled at last stage
- attempting to complete an already-completed op.
- No onConnected callback is registered (error becasue our code always registers this).
- Our code raises an exception inside of a transport callback, we catch it and use logger.error to report it, but don't want to crash the transport.
(very rare)
Something happened that we couldn't report to you, but we think it might not be that bad.
Examples:
- Message received on a module input, but user is not subscribed for that input.
(common)
Something significant happened, but you only get the high level info.
Examples:
- User called into our code and we started something
- We returned an error to the user
- We called into the transport
- The transport calls into us
- An error was created inside our code out of something that wasn't an error. e.g. if queue_size == 0: raise Exception("queue should not be empty").
- Errors that come from another layer do not warrant a logger.info call.
- A new operation was initiated as a side-effect of another. e.g. starting a connection on publishing
- A new event is crated as a side-effect of something unrelated.
- Completion of an operation is cut-short. e.g. ConnectionOp() completes early because transport is already connected.
- We receive a message on a topic that isn't registered or the MID or RID is unexpected
- An op is put into a queue.
- The queue unblocks and ops start again
- A timer caused something new to start. e.g. "starting sas token renewal".
(super-duper common)
Anything else.
Examples:
- Calling into some internal object.
- An error is returned from some internal object
- An internal object got a call from some other internal object
- An internal object returned an error to some other internal object.
- An operation is completed or ignored
- Details on some internal structure or some internal processing. e.g. "topic parsed. method name=blah"
- Progress report on some internal flow. (e.g. Calling into SSL, done initializing pipeline)
- State changes. (e.g. "CONACK received. Setting connected=True")
- Internal handoff (e.g. "incoming message put into message inbox")
- Internal object created or destroyed.
- Timer start, stop, run
~ ~