Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TT-1741] performance comparison tool #1424

Merged
merged 64 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
0337c29
preliminary version of comparison tool
Tofel Dec 2, 2024
d58b313
k8s resources reading + latest commit finding
Tofel Dec 3, 2024
3466334
fix commit lookup
Tofel Dec 3, 2024
605856c
split report into reusable components vol 1
Tofel Dec 3, 2024
a21de6f
make the tool more abstract
Tofel Dec 3, 2024
17531fd
correct propagation of context, no mutex -> use errgroup instead, one…
Tofel Dec 3, 2024
d574af5
add logger (unused yet)
Tofel Dec 3, 2024
c56cbd7
compose Reporter interface of smaller ones
Tofel Dec 3, 2024
d90596d
fix interfaces
Tofel Dec 3, 2024
f6e07ed
add type, start/end time to segments, constructor for BasicData, get …
Tofel Dec 4, 2024
fa8c711
add standard set of metrics/queries and a new test for them
Tofel Dec 4, 2024
a6bbbd3
add plain-segment-only notes/comments
Tofel Dec 4, 2024
072556d
remove most of resource reporter, what we want is actual cpu/mem usag…
Tofel Dec 4, 2024
7dd3cfa
add benchspy unit tests
Tofel Dec 4, 2024
a037c1c
execute benchspy tests in ci
Tofel Dec 4, 2024
dcdd2e0
fix test coverage check
Tofel Dec 4, 2024
68297e6
Merge remote-tracking branch 'origin/main' into tt-1741-performance-c…
Tofel Dec 6, 2024
36f4bd3
use median instead of average as a standard metric
Tofel Dec 6, 2024
c55d236
add generator unit tests
Tofel Dec 6, 2024
301819b
Merge remote-tracking branch 'origin' into tt-1741-performance-compar…
Tofel Dec 9, 2024
605096a
fix independence of generator tests
Tofel Dec 9, 2024
aa2a4eb
Merge branch 'main' into tt-1741-performance-comparison-tool
Tofel Dec 9, 2024
08bbb92
add Prometheus support
Tofel Dec 10, 2024
6d63f5f
remove ResourceFetcher, now Prometheus is just another QueryExecutor
Tofel Dec 10, 2024
7e27cca
fix existing tests
Tofel Dec 10, 2024
d2b61b7
add unit tests for prometheus
Tofel Dec 11, 2024
c2cf545
fix remaining unit tests
Tofel Dec 11, 2024
c2236dd
add helper methods for fetching current and previous report
Tofel Dec 11, 2024
305088e
more unit tests
Tofel Dec 11, 2024
570ebaa
add working test examples, some docs, small code changes
Tofel Dec 12, 2024
827b282
fix lints
Tofel Dec 13, 2024
a189b41
more docs
Tofel Dec 13, 2024
af2a330
one more doc
Tofel Dec 13, 2024
358108a
rename Generator to Direct
Tofel Dec 13, 2024
5a1c3e7
smoother docs
Tofel Dec 13, 2024
73df456
fix median calculation for missing data in examples
Tofel Dec 13, 2024
b847867
add explanation why p95 of direct and loki might not be the same
Tofel Dec 13, 2024
2c56a03
[Bot] Add automatically generated go documentation (#1474)
app-token-issuer-test-toolings[bot] Dec 13, 2024
7531863
update troubleshooting
skudasov Dec 9, 2024
fcc4c27
more docs
skudasov Dec 9, 2024
14a344e
Remove logs from flakeguard all test results (#1453)
lukaszcl Dec 9, 2024
54cc588
[TT-1725] go doc enhancements vol 2 (add tools, better comment in PR)…
Tofel Dec 10, 2024
f041164
Add metadata to Flakeguard report (#1473)
lukaszcl Dec 11, 2024
2884289
Separate JD database (#1472)
skudasov Dec 11, 2024
ba555a6
Fix url and add node container internal ip (#1477)
b-gopalswami Dec 12, 2024
93c902a
Flakeguard: improve report aggregation performance and omit output fo…
lukaszcl Dec 12, 2024
a3f2385
tiny adjustments to Seth docs (#1482)
Tofel Dec 16, 2024
554bb51
Keep test outputs for Flakeguard in separate fields (#1485)
lukaszcl Dec 16, 2024
d66b740
move back loki client test to lib/client
Tofel Dec 16, 2024
22a0417
do not use custom percentile function
Tofel Dec 17, 2024
9cd0fd7
cr changes, more tests, improved docs, real world examaple
Tofel Dec 18, 2024
d389c00
print table with Direct metrics
Tofel Dec 18, 2024
87d67d9
update docs, divide loki/direct results when casting per generator
Tofel Dec 18, 2024
f5e1c5a
Merge branch 'main' into tt-1741-performance-comparison-tool
Tofel Dec 18, 2024
72e7370
update reports doc, include WASP fix
Tofel Dec 18, 2024
987fee9
[Bot] Add automatically generated go documentation (#1486)
app-token-issuer-test-toolings[bot] Dec 18, 2024
e4d075d
Merge branch 'main' into tt-1741-performance-comparison-tool
Tofel Dec 18, 2024
a8e7ae4
lower cyclomatic complexity
Tofel Dec 18, 2024
befd51c
remove cover.html
Tofel Dec 18, 2024
70bd176
gitignore cover.html
Tofel Dec 18, 2024
cb02f63
Eliminate VU races, unify execution loop, remove cpu check loop (#1505)
skudasov Dec 19, 2024
2822f7c
use newer go doc generator
Tofel Dec 19, 2024
d1e6563
Merge branch 'main' into tt-1741-performance-comparison-tool
Tofel Dec 19, 2024
4edd870
ignore x/net vulerability
Tofel Dec 19, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion lib/client/loki.go
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,17 @@ func (lc *LokiClient) extractRawLogEntries(lokiResp LokiResponse) []LokiLogEntry

for _, result := range lokiResp.Data.Result {
for _, entry := range result.Values {
timestamp := entry[0].(string)
var timestamp string
Tofel marked this conversation as resolved.
Show resolved Hide resolved
if timestampString, ok := entry[0].(string); ok {
timestamp = timestampString
} else if timestampInt, ok := entry[0].(int); ok {
timestamp = fmt.Sprintf("%d", timestampInt)
} else if timestampFloat, ok := entry[0].(float64); ok {
timestamp = fmt.Sprintf("%f", timestampFloat)
} else {
lc.Logger.Error().Msg("Error parsing timestamp")
continue
}
logLine := entry[1].(string)
logEntries = append(logEntries, LokiLogEntry{
Timestamp: timestamp,
Expand Down
2 changes: 1 addition & 1 deletion wasp/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ bin/
.vscode/
.idea/
.direnv/

performance_reports/
k3dvolume/
.private.env
.envrc.ci
Expand Down
11 changes: 11 additions & 0 deletions wasp/benchspy/TO_DO.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Known things to do:
- [ ] add logger
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls explain benefits of things to do here

- [ ] add tests
- [ ] write documentation
- [ ] test with Docker app (focus: resources)
- [ ] test with k8s app (focus: resources)
- [ ] add report builder (?)
- [ ] add wrapper function for executing some code and then creating a report
- [ ] think what to do with errors... do we really need a slice?
- [ ] if TestEnd is zero, default to `time.Now()`?
- [ ] add helper method for a profile what would create a report based on all generators?
107 changes: 107 additions & 0 deletions wasp/benchspy/basic.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
package benchspy

import (
"encoding/json"
"fmt"
"time"

"github.com/pkg/errors"
"github.com/smartcontractkit/chainlink-testing-framework/wasp"
)

// BasicData is the basic data that is required for a report, common to all reports
type BasicData struct {
TestName string `json:"test_name"`
CommitOrTag string `json:"commit_or_tag"`

// Test metrics
TestStart time.Time `json:"test_start_timestamp"`
TestEnd time.Time `json:"test_end_timestamp"`

// all, generator settings, including segments
GeneratorConfigs map[string]*wasp.Config `json:"generator_configs"`
}

func (b *BasicData) Validate() error {
if b.TestStart.IsZero() {
return errors.New("test start time is missing. We cannot query Loki without a time range. Please set it and try again")
}
if b.TestEnd.IsZero() {
return errors.New("test end time is missing. We cannot query Loki without a time range. Please set it and try again")
}

if len(b.GeneratorConfigs) == 0 {
return errors.New("generator configs are missing. At least one is required. Please set them and try again")
}

return nil
}

func (b *BasicData) IsComparable(otherData BasicData) error {
// are all configs present? do they have the same schedule type? do they have the same segments? is call timeout the same? is rate limit timeout the same?
if len(b.GeneratorConfigs) != len(otherData.GeneratorConfigs) {
return fmt.Errorf("generator configs count is different. Expected %d, got %d", len(b.GeneratorConfigs), len(otherData.GeneratorConfigs))
}

for name1, cfg1 := range b.GeneratorConfigs {
if cfg2, ok := otherData.GeneratorConfigs[name1]; !ok {
return fmt.Errorf("generator config %s is missing from the other report", name1)
} else {
if err := compareGeneratorConfigs(cfg1, cfg2); err != nil {
return err
}
}
}

for name2 := range otherData.GeneratorConfigs {
if _, ok := b.GeneratorConfigs[name2]; !ok {
return fmt.Errorf("generator config %s is missing from the current report", name2)
}
}

// TODO: would be good to be able to check if Gun and VU are the same, but idk yet how we could do that easily [hash the code?]
Tofel marked this conversation as resolved.
Show resolved Hide resolved

return nil
}

func compareGeneratorConfigs(cfg1, cfg2 *wasp.Config) error {
if cfg1.LoadType != cfg2.LoadType {
return fmt.Errorf("load types are different. Expected %s, got %s", cfg1.LoadType, cfg2.LoadType)
}

if len(cfg1.Schedule) != len(cfg2.Schedule) {
return fmt.Errorf("schedules are different. Expected %d, got %d", len(cfg1.Schedule), len(cfg2.Schedule))
}

for i, segment1 := range cfg1.Schedule {
segment2 := cfg2.Schedule[i]
if segment1 == nil {
return fmt.Errorf("schedule at index %d is nil in the current report", i)
}
if segment2 == nil {
return fmt.Errorf("schedule at index %d is nil in the other report", i)
}
if *segment1 != *segment2 {
return fmt.Errorf("schedules at index %d are different. Expected %s, got %s", i, mustMarshallSegment(segment1), mustMarshallSegment(segment2))
}
}

if cfg1.CallTimeout != cfg2.CallTimeout {
Tofel marked this conversation as resolved.
Show resolved Hide resolved
return fmt.Errorf("call timeouts are different. Expected %s, got %s", cfg1.CallTimeout, cfg2.CallTimeout)
}

if cfg1.RateLimitUnitDuration != cfg2.RateLimitUnitDuration {
return fmt.Errorf("rate limit unit durations are different. Expected %s, got %s", cfg1.RateLimitUnitDuration, cfg2.RateLimitUnitDuration)
}

return nil
}

func mustMarshallSegment(segment *wasp.Segment) string {
segmentBytes, err := json.MarshalIndent(segment, "", " ")
if err != nil {
panic(err)
}

return string(segmentBytes)
}
32 changes: 32 additions & 0 deletions wasp/benchspy/log.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
package benchspy

import (
"os"

"github.com/rs/zerolog"
"github.com/rs/zerolog/log"
)

const (
LogLevelEnvVar = "BENCHSPY_LOG_LEVEL"
)

var (
L zerolog.Logger
)

func init() {
initDefaultLogging()
}

func initDefaultLogging() {
lvlStr := os.Getenv(LogLevelEnvVar)
if lvlStr == "" {
lvlStr = "info"
}
lvl, err := zerolog.ParseLevel(lvlStr)
if err != nil {
panic(err)
}
L = log.Output(zerolog.ConsoleWriter{Out: os.Stderr}).Level(lvl)
}
161 changes: 161 additions & 0 deletions wasp/benchspy/loki.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
package benchspy

import (
"context"
"fmt"
"net/url"
"reflect"
"strings"
"time"

"github.com/pkg/errors"
"github.com/smartcontractkit/chainlink-testing-framework/lib/client"
"github.com/smartcontractkit/chainlink-testing-framework/wasp"
"golang.org/x/sync/errgroup"
)

func NewLokiQueryExecutor(queries map[string]string, lokiConfig *wasp.LokiConfig) *LokiQueryExecutor {
return &LokiQueryExecutor{
Kind: "loki",
Queries: queries,
LokiConfig: lokiConfig,
QueryResults: make(map[string][]string),
}
}

type LokiQueryExecutor struct {
Kind string `json:"kind"`
// Test metrics
StartTime time.Time `json:"start_time"`
EndTime time.Time `json:"end_time"`

// Performance queries
// a map of name to query template, ex: "average cpu usage": "avg(rate(cpu_usage_seconds_total[5m]))"
Queries map[string]string `json:"queries"`
// Performance queries results
// can be anything, avg RPS, amount of errors, 95th percentile of CPU utilization, etc
QueryResults map[string][]string `json:"query_results"`
// In case something went wrong, but sure we need it
Errors []error `json:"errors"`

LokiConfig *wasp.LokiConfig `json:"-"`
}

func (l *LokiQueryExecutor) Results() map[string][]string {
return l.QueryResults
}

func (l *LokiQueryExecutor) IsComparable(otherQueryExecutor QueryExecutor) error {
otherType := reflect.TypeOf(otherQueryExecutor)

if otherType != reflect.TypeOf(l) {
return fmt.Errorf("expected type %s, got %s", reflect.TypeOf(l), otherType)
}

return l.compareLokiQueries(otherQueryExecutor.(*LokiQueryExecutor).Queries)
}

func (l *LokiQueryExecutor) Validate() error {
if len(l.Queries) == 0 {
return errors.New("there are no Loki queries, there's nothing to fetch. Please set them and try again")
}
if l.LokiConfig == nil {
return errors.New("loki config is missing. Please set it and try again")
}

return nil
}

func (l *LokiQueryExecutor) Execute(ctx context.Context) error {
splitAuth := strings.Split(l.LokiConfig.BasicAuth, ":")
var basicAuth client.LokiBasicAuth
if len(splitAuth) == 2 {
basicAuth = client.LokiBasicAuth{
Login: splitAuth[0],
Password: splitAuth[1],
}
}

l.QueryResults = make(map[string][]string)
resultCh := make(chan map[string][]string, len(l.Queries))
errGroup, errCtx := errgroup.WithContext(ctx)

for name, query := range l.Queries {
errGroup.Go(func() error {
queryParams := client.LokiQueryParams{
Query: query,
StartTime: l.StartTime,
EndTime: l.EndTime,
Limit: 1000, //TODO make this configurable
}

parsedLokiUrl, err := url.Parse(l.LokiConfig.URL)
if err != nil {
return errors.Wrapf(err, "failed to parse Loki URL %s", l.LokiConfig.URL)
}

lokiUrl := parsedLokiUrl.Scheme + "://" + parsedLokiUrl.Host
lokiClient := client.NewLokiClient(lokiUrl, l.LokiConfig.TenantID, basicAuth, queryParams)

rawLogs, err := lokiClient.QueryLogs(errCtx)
if err != nil {
return errors.Wrapf(err, "failed to query logs for %s", name)
}

resultMap := make(map[string][]string)
for _, log := range rawLogs {
resultMap[name] = append(resultMap[name], log.Log)
}

select {
case resultCh <- resultMap:
return nil
case <-errCtx.Done():
return errCtx.Err() // Allows goroutine to exit if timeout occurs
}
})
}

if err := errGroup.Wait(); err != nil {
return errors.Wrap(err, "failed to execute Loki queries")
}

for i := 0; i < len(l.Queries); i++ {
result := <-resultCh
for name, logs := range result {
l.QueryResults[name] = logs
}
}

return nil
}

func (l *LokiQueryExecutor) compareLokiQueries(other map[string]string) error {
this := l.Queries
if len(this) != len(other) {
return fmt.Errorf("queries count is different. Expected %d, got %d", len(this), len(other))
}

for name1, query1 := range this {
if query2, ok := other[name1]; !ok {
return fmt.Errorf("query %s is missing from the other report", name1)
} else {
if query1 != query2 {
return fmt.Errorf("query %s is different. Expected %s, got %s", name1, query1, query2)
}
}
}

for name2 := range other {
if _, ok := this[name2]; !ok {
return fmt.Errorf("query %s is missing from the current report", name2)
}
}

return nil
}

func (l *LokiQueryExecutor) TimeRange(start, end time.Time) {
l.StartTime = start
l.EndTime = end
}
Loading
Loading