[TT-590] logwatch with multiple targets and retries #789

Tofel · 2023-11-28T12:26:30Z

This PR introduces the ability to easy add new logging targets while adding three of them: Loki, file and in-memory. It also adds retries for log production failures.

It is possible to use any number of logging targets concurrently. By default there are no logging targets set. They should be specified using env variable LOGWATCH_LOG_TARGETS (case insensitive). For testing purposes I also added functional option to set them.

The constructor accepts also a more couple of variadic options:

log producer timeout
log producer retry count

It also executes some log target validations and operations:

is there a known handler for every requested log target?
deactivate known handlers that weren't activated
Also, Loki client is now lazy initalised the first time it is requested instead of being always initialised in the constructor (even if streaming logs to Loki was not requested)
it reads or generates run_id that's used to group together test logs in Loki (in CI it's GH's run_id on local it's a generated value stored in .run.id file)

Apart from that a lot of other things were done, like:

removed pattern matching per conversations with @skudasov
removed saving all logs in memory per conversations with @skudasov (although you could achieve similar behaviour using in-memory log target)
adding a detached goroutine that listens to log producer errors and retries to restart it and listen to logs with given retry limit

Last, but not least: I've added buffering logs in temp file. So in general the flow is like this:

when starting the consumer we create a temp file, store it's reference and create a gob encoder and store it's reference
when a new log comes we encode it and save in the temp file
when we want to flush logs to whatever targets LW has, we read the temp file again, decode logs and let handler handle them (we read one log at a time instead of all of them at once to avoid memory issues, but in the future maybe we should batch read)

Important: when we flush logs, then consumer state is set to done. From now on it won't accept any logs and sending them will log an error. Also, because we work with a temp file that's read/written using a shared reference we cannot read/write at the same time as it would mess up cursor position, that's why there's a mutex to synchronise these actions.

Based on run_id and container_ids it's possible to save a test summary with log location for tests that were executed. It can be used to display link to Grafana Dashboard with logs of failed tests.

It is recommended to connect/disconnect LogWatch to containers using PostStarts/PostStops hooks instead of doing it manually.

Known issues existing in previous versions:

if the test is too short logs might not be sent to Loki before test finishes (we should consider adding a way to flush LOKI client on shutdown)

New env variables:

GRAFANA_URL -- URL to Grafana instance that is connected to Loki instance to which logs where streamed
GRAFANA_DATASOURCE -- Loki datasource id that will be used when building the URL
LOKI_TENANT_ID
LOKI_URL
LOKI_BASIC_AUTH
LOGWATCH_LOG_TARGETS -- "file", "loki" or "in-memory"

Chainlink repo example: https://github.com/smartcontractkit/chainlink/actions/runs/7105718638

codecov-commenter · 2023-11-28T12:32:16Z

Codecov Report

Attention: 374 lines in your changes are missing coverage. Please review.

Comparison is base (ccd4668) 31.04% compared to head (6a0f414) 32.38%.

Files	Patch %	Lines
logstream/logstream.go	57.31%	206 Missing and 39 partials ⚠️
logstream/logstream_handlers.go	21.81%	122 Missing and 7 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #789      +/-   ##
==========================================
+ Coverage   31.04%   32.38%   +1.34%     
==========================================
  Files          41       42       +1     
  Lines        5370     5993     +623     
==========================================
+ Hits         1667     1941     +274     
- Misses       3524     3832     +308     
- Partials      179      220      +41

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

skudasov · 2023-12-06T16:19:04Z

logstream/logstream.go

+	}
+}
+
+// Stop stops the consumer and closes temp file


Maybe we change the comment, I don't see any file closing.

skudasov · 2023-12-06T16:31:36Z

logstream/logstream_handlers.go

+		return errors.Wrap(err, "failed to open log file. File logging stopped")
+	}
+
+	defer logFile.Close()


We are closing and opening a file for each line, can we open the file once?

… log is saved

cl-sonarqube-production · 2023-12-07T17:46:30Z

SonarQube Quality Gate

6.99% Duplicated Lines (%) on New Code (is greater than 3%)

See analysis details on SonarQube

Tofel added 10 commits October 19, 2023 14:15

add support for multiple log targets to LogWatch

539eaea

fix linting errors

2e9e09a

fix lints and unit tests (do not try to send anything to Loki)

d87c27a

use string constants instead of a bitmap for log targets

d2b9366

fix lint

5180c36

Merge branch 'main' into tt_590_custom_log_targets

32c7157

store log location data on disk and add a command to print it

158ed87

update go.mod

f6086c9

Merge branch 'main' into tt_590_custom_log_targets

b2a4e3c

logwatch that retries getting docker logs if for some reason it fails

ccc5ba0

Tofel temporarily deployed to integration November 28, 2023 12:26 — with GitHub Actions Inactive

Tofel temporarily deployed to integration November 28, 2023 12:30 — with GitHub Actions Inactive

Tofel added 2 commits November 30, 2023 10:02

tests: WIP

9dea355

working tests for logwatch retries

4f5e13f

Tofel had a problem deploying to integration November 30, 2023 10:52 — with GitHub Actions Failure

use my private testcontainers-go

506c9bd

Tofel temporarily deployed to integration November 30, 2023 11:18 — with GitHub Actions Inactive

Tofel had a problem deploying to integration November 30, 2023 11:18 — with GitHub Actions Error

Tofel had a problem deploying to integration November 30, 2023 11:22 — with GitHub Actions Error

remove debug prints

92ceaee

Tofel temporarily deployed to integration November 30, 2023 11:26 — with GitHub Actions Inactive

Tofel had a problem deploying to integration November 30, 2023 11:26 — with GitHub Actions Error

Tofel had a problem deploying to integration November 30, 2023 11:29 — with GitHub Actions Error

run logwatch tests

b2d7a6a

Tofel temporarily deployed to integration November 30, 2023 11:34 — with GitHub Actions Inactive

Tofel temporarily deployed to integration November 30, 2023 11:38 — with GitHub Actions Inactive

add test name to grafana dashboard url, if test name is set

2568fb4

Tofel temporarily deployed to integration December 6, 2023 16:08 — with GitHub Actions Inactive

Tofel had a problem deploying to integration December 6, 2023 16:08 — with GitHub Actions Failure

Tofel temporarily deployed to integration December 6, 2023 16:11 — with GitHub Actions Inactive

skudasov reviewed Dec 6, 2023

View reviewed changes

skudasov approved these changes Dec 6, 2023

View reviewed changes

open log file in handler.Init() to avoid opening file each time a new…

6f9e83e

… log is saved

Tofel temporarily deployed to integration December 6, 2023 19:23 — with GitHub Actions Inactive

Tofel had a problem deploying to integration December 6, 2023 19:27 — with GitHub Actions Error

add method to get absolute folder path

6a0f414

Tofel temporarily deployed to integration December 6, 2023 19:37 — with GitHub Actions Inactive

Tofel had a problem deploying to integration December 6, 2023 19:41 — with GitHub Actions Failure

use latest loki, that exists on log pushing

4a206b6

Tofel had a problem deploying to integration December 7, 2023 12:39 — with GitHub Actions Failure

Tofel temporarily deployed to integration December 7, 2023 12:39 — with GitHub Actions Inactive

Tofel had a problem deploying to integration December 7, 2023 12:43 — with GitHub Actions Failure

fix go.sum

f455ffb

Tofel temporarily deployed to integration December 7, 2023 17:36 — with GitHub Actions Inactive

Tofel had a problem deploying to integration December 7, 2023 17:36 — with GitHub Actions Error

Tofel had a problem deploying to integration December 7, 2023 17:40 — with GitHub Actions Error

Merge branch 'main' into tt_590_custom_log_targets_retries

1fad2c8

Tofel temporarily deployed to integration December 7, 2023 17:43 — with GitHub Actions Inactive

Tofel had a problem deploying to integration December 7, 2023 17:47 — with GitHub Actions Failure

Tofel merged commit 2bd2b2d into main Dec 7, 2023
12 of 14 checks passed

Tofel deleted the tt_590_custom_log_targets_retries branch December 7, 2023 17:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TT-590] logwatch with multiple targets and retries #789

[TT-590] logwatch with multiple targets and retries #789

Tofel commented Nov 28, 2023 •

edited

Loading

codecov-commenter commented Nov 28, 2023 •

edited

Loading

skudasov Dec 6, 2023

skudasov Dec 6, 2023

cl-sonarqube-production bot commented Dec 7, 2023

[TT-590] logwatch with multiple targets and retries #789

[TT-590] logwatch with multiple targets and retries #789

Conversation

Tofel commented Nov 28, 2023 • edited Loading

codecov-commenter commented Nov 28, 2023 • edited Loading

Codecov Report

skudasov Dec 6, 2023

Choose a reason for hiding this comment

skudasov Dec 6, 2023

Choose a reason for hiding this comment

cl-sonarqube-production bot commented Dec 7, 2023

Tofel commented Nov 28, 2023 •

edited

Loading

codecov-commenter commented Nov 28, 2023 •

edited

Loading