Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TT-590] logwatch with multiple targets and retries #789

Merged
merged 44 commits into from
Dec 7, 2023

Conversation

Tofel
Copy link
Contributor

@Tofel Tofel commented Nov 28, 2023

This PR introduces the ability to easy add new logging targets while adding three of them: Loki, file and in-memory. It also adds retries for log production failures.

It is possible to use any number of logging targets concurrently. By default there are no logging targets set. They should be specified using env variable LOGWATCH_LOG_TARGETS (case insensitive). For testing purposes I also added functional option to set them.

The constructor accepts also a more couple of variadic options:

  • log producer timeout
  • log producer retry count

It also executes some log target validations and operations:

  • is there a known handler for every requested log target?
  • deactivate known handlers that weren't activated
    Also, Loki client is now lazy initalised the first time it is requested instead of being always initialised in the constructor (even if streaming logs to Loki was not requested)
  • it reads or generates run_id that's used to group together test logs in Loki (in CI it's GH's run_id on local it's a generated value stored in .run.id file)

Apart from that a lot of other things were done, like:

  • removed pattern matching per conversations with @skudasov
  • removed saving all logs in memory per conversations with @skudasov (although you could achieve similar behaviour using in-memory log target)
  • adding a detached goroutine that listens to log producer errors and retries to restart it and listen to logs with given retry limit

Last, but not least: I've added buffering logs in temp file. So in general the flow is like this:

  • when starting the consumer we create a temp file, store it's reference and create a gob encoder and store it's reference
  • when a new log comes we encode it and save in the temp file
  • when we want to flush logs to whatever targets LW has, we read the temp file again, decode logs and let handler handle them (we read one log at a time instead of all of them at once to avoid memory issues, but in the future maybe we should batch read)

Important: when we flush logs, then consumer state is set to done. From now on it won't accept any logs and sending them will log an error. Also, because we work with a temp file that's read/written using a shared reference we cannot read/write at the same time as it would mess up cursor position, that's why there's a mutex to synchronise these actions.

Based on run_id and container_ids it's possible to save a test summary with log location for tests that were executed. It can be used to display link to Grafana Dashboard with logs of failed tests.

It is recommended to connect/disconnect LogWatch to containers using PostStarts/PostStops hooks instead of doing it manually.

Known issues existing in previous versions:

  • if the test is too short logs might not be sent to Loki before test finishes (we should consider adding a way to flush LOKI client on shutdown)

New env variables:

  • GRAFANA_URL -- URL to Grafana instance that is connected to Loki instance to which logs where streamed
  • GRAFANA_DATASOURCE -- Loki datasource id that will be used when building the URL
  • LOKI_TENANT_ID
  • LOKI_URL
  • LOKI_BASIC_AUTH
  • LOGWATCH_LOG_TARGETS -- "file", "loki" or "in-memory"

Chainlink repo example: https://github.com/smartcontractkit/chainlink/actions/runs/7105718638

@codecov-commenter
Copy link

codecov-commenter commented Nov 28, 2023

Codecov Report

Attention: 374 lines in your changes are missing coverage. Please review.

Comparison is base (ccd4668) 31.04% compared to head (6a0f414) 32.38%.

Files Patch % Lines
logstream/logstream.go 57.31% 206 Missing and 39 partials ⚠️
logstream/logstream_handlers.go 21.81% 122 Missing and 7 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #789      +/-   ##
==========================================
+ Coverage   31.04%   32.38%   +1.34%     
==========================================
  Files          41       42       +1     
  Lines        5370     5993     +623     
==========================================
+ Hits         1667     1941     +274     
- Misses       3524     3832     +308     
- Partials      179      220      +41     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

}
}

// Stop stops the consumer and closes temp file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we change the comment, I don't see any file closing.

return errors.Wrap(err, "failed to open log file. File logging stopped")
}

defer logFile.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are closing and opening a file for each line, can we open the file once?

@cl-sonarqube-production
Copy link

SonarQube Quality Gate

Quality Gate failed

Failed condition 7.0% 6.99% Duplicated Lines (%) on New Code (is greater than 3%)

See analysis details on SonarQube

@Tofel Tofel merged commit 2bd2b2d into main Dec 7, 2023
12 of 14 checks passed
@Tofel Tofel deleted the tt_590_custom_log_targets_retries branch December 7, 2023 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants