-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
querylog
: json format version 2
#15271
querylog
: json format version 2
#15271
Conversation
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15271 +/- ##
==========================================
+ Coverage 67.38% 67.50% +0.12%
==========================================
Files 1560 1562 +2
Lines 192752 193439 +687
==========================================
+ Hits 129886 130582 +696
+ Misses 62866 62857 -9 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
EDIT: clicked wrong PR tab, this comment was for #14460 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @vmg because you just worked on some code in this area.
|
||
// NewBindVariable returns a wrapped *querypb.BindVariable object. | ||
func NewBindVariable(bv *querypb.BindVariable) BindVariable { | ||
newBv := BindVariable{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We know the capacity of newBv.Values
up front and should preallocate it here before the loop.
|
||
// MarshalJSON renders the BindVariableValue as json and optionally redacts the value. | ||
func (bv BindVariableValue) MarshalJSON() ([]byte, error) { | ||
out := map[string]interface{}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out := map[string]interface{}{ | |
out := map[string]any{ |
For new code IMO.
|
||
// bindVariablesValuesToProto converts a slice of *BindVariableValue to *querypb.Value. | ||
func bindVariablesValuesToProto(vals []*BindVariableValue) ([]*querypb.Value, error) { | ||
values := make([]*querypb.Value, len(vals)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
values := make([]*querypb.Value, len(vals)) | |
values := make([]*querypb.Value, 0, len(vals)) |
This needs to be allocated with length 0 or else you'll end up appending to a slice populated with zero values up front.
@@ -90,6 +99,9 @@ func registerStreamLogFlags(fs *pflag.FlagSet) { | |||
// QueryLogFormat controls the format of the query log (either text or json) | |||
fs.StringVar(&queryLogFormat, "querylog-format", queryLogFormat, "format for query logs (\"text\" or \"json\")") | |||
|
|||
// QueryLogJSONFormat controls whether the new querylog json format is used | |||
fs.BoolVar(&queryLogJSONV2, "querylog-json-v2", false, "use v2 format for querylog-format=json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we already have --querylog-format=json
, why not --querylog-format=json-v2
rather than a new flag?
@@ -36,14 +36,13 @@ import ( | |||
querypb "vitess.io/vitess/go/vt/proto/query" | |||
) | |||
|
|||
// LogStats records the stats for a single vtgate query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep this.
@@ -52,13 +51,21 @@ type LogStats struct { | |||
PlanTime time.Duration | |||
ExecuteTime time.Duration | |||
CommitTime time.Duration | |||
Error error | |||
Error error `json:",omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems incorrect. Did you intend for -
?
TablesUsed []string | ||
SessionUUID string | ||
CachedPlan bool | ||
ActiveKeyspace string // ActiveKeyspace is the selected keyspace `use ks` | ||
} | ||
|
||
type LogStatsJSON struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name with JSONv2 to separate from the base implementation?
PlanType string | ||
OriginalSQL string | ||
BindVariables map[string]*querypb.BindVariable | ||
rewrittenSqls []string | ||
BindVariables map[string]streamlog.BindVariable `json:",omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as prior.
TransactionID int64 | ||
ReservedID int64 | ||
Error error | ||
Error error `json:",omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as prior.
Method string | ||
TabletType string | ||
StmtType string | ||
SQL string | ||
BindVariables map[string]*querypb.BindVariable | ||
BindVariables map[string]streamlog.BindVariable `json:",omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below.
Oy my bad, I completely overlooked this PR when I began working on #15539 so there's an obvious conflict. That PR is now merged and the Because of this, I'd like to see the changes in this PR implemented using the same style of "log builder" that I've merged. I think all the proposed syntax changes here are perfectly reasonable, and shipping them behind a @timvaillancourt: I'm gonna try to find some time this week (maybe Friday morning since it's a day off) to port this PR to the new builder API. If this is urgent to you, please feel free to do it yourself and ask me any questions! Otherwise I don't mind picking up the work here because, again, we've found out how critical performance-wise is the logging path. |
@vmg 👋 long time! No problem, I'm excited to see performance improvements - I suspected this code path could be optimized
Yeah, that's a big drawback for this PR 👎. Perhaps this is why
This isn't urgent but happy to help somehow! If you have some time to describe what you had planned I can offload this optionally |
Signed-off-by: Vilius Okockis <[email protected]>
https://github.com/vitessio/vitess/pull/15539/files#diff-3c26e166a7ecbd772596c75f87d03b0dc849be5a5a443b8429be0e28a170519cR66-R68 |
@timvaillancourt from our perspective, it is good enough to encode only JSON string values, so this is my backport on top of almost zero allocation logger. encode benchimport (
"bytes"
"context"
"fmt"
"testing"
"time"
"vitess.io/vitess/go/logstats"
"vitess.io/vitess/go/streamlog"
querypb "vitess.io/vitess/go/vt/proto/query"
"vitess.io/vitess/go/vt/vtgate"
)
func BenchmarkLogStatsFormat(b *testing.B) {
b.ReportAllocs()
stats := vtgate.NewLogStats(context.Background(), "test", "sql1", "suuid", nil)
stats.StartTime = time.Date(2017, time.January, 1, 1, 2, 3, 0, time.UTC)
stats.EndTime = time.Date(2017, time.January, 1, 1, 2, 4, 1234, time.UTC)
stats.Table = "ks1.tbl1,ks2.tbl2"
stats.TabletType = "MASTER"
stats.Keyspace = "db"
stats.BindVariables = map[string]*querypb.BindVariable{
"Type_BINARY, ": {Type: querypb.Type_BINARY, Value: []byte("a")}, // encoded base64
"Type_BIT, ": {Type: querypb.Type_BIT, Value: []byte("1")}, // encoded base64
"Type_BLOB, ": {Type: querypb.Type_BLOB, Value: []byte("a")}, // encoded base64
"Type_CHAR, ": {Type: querypb.Type_CHAR, Value: []byte("a")}, // encoded base64
"Type_DATE, ": {Type: querypb.Type_DATE, Value: []byte("2012-02-24")}, // encoded base64
"Type_DATETIME,": {Type: querypb.Type_DATETIME, Value: []byte("2012-02-24 23:19:43")}, // encoded base64
"Type_DECIMAL, ": {Type: querypb.Type_DECIMAL, Value: []byte("1.00")}, // encoded base64
"Type_ENUM, ": {Type: querypb.Type_ENUM, Value: []byte("a")}, // encoded base64
"Type_FLOAT32, ": {Type: querypb.Type_FLOAT32, Value: []byte("1.00")},
"Type_FLOAT64, ": {Type: querypb.Type_FLOAT64, Value: []byte("1.00")},
"Type_INT8, ": {Type: querypb.Type_INT8, Value: []byte("1")},
"Type_INT16, ": {Type: querypb.Type_INT16, Value: []byte("1")},
"Type_INT24, ": {Type: querypb.Type_INT24, Value: []byte("1")},
"Type_INT32, ": {Type: querypb.Type_INT32, Value: []byte("1")},
"Type_INT64, ": {Type: querypb.Type_INT64, Value: []byte("1")},
"Type_SET, ": {Type: querypb.Type_SET, Value: []byte("a")}, // encoded base64
"Type_TEXT, ": {Type: querypb.Type_TEXT, Value: []byte("a")}, // encoded base64
"Type_TIME, ": {Type: querypb.Type_TIME, Value: []byte("23:19:43")}, // encoded base64
"Type_TIMESTAMP": {Type: querypb.Type_TIMESTAMP, Value: []byte("2012-02-24 23:19:43")}, // encoded base64
"Type_UINT8, ": {Type: querypb.Type_UINT8, Value: []byte("1")},
"Type_UINT16, ": {Type: querypb.Type_UINT16, Value: []byte("1")},
"Type_UINT24, ": {Type: querypb.Type_UINT24, Value: []byte("1")},
"Type_UINT32, ": {Type: querypb.Type_UINT32, Value: []byte("1")},
"Type_UINT64, ": {Type: querypb.Type_UINT64, Value: []byte("1")},
"Type_VARBINARY": {Type: querypb.Type_VARBINARY, Value: []byte("a")}, // encoded base64
"Type_VARCHAR, ": {Type: querypb.Type_VARCHAR, Value: []byte("a")}, // encoded base64
"Type_YEAR, ": {Type: querypb.Type_YEAR, Value: []byte("1")}, // encoded base64
"Type_TUPLE,1 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_FLOAT64, Value: []byte("1")}, {Type: querypb.Type_FLOAT64, Value: []byte("2")}}},
"Type_TUPLE,2 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_INT64, Value: []byte("1")}, {Type: querypb.Type_INT64, Value: []byte("2")}}},
"Type_TUPLE,3 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_UINT64, Value: []byte("1")}, {Type: querypb.Type_UINT64, Value: []byte("2")}}},
"Type_TUPLE,4 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_VARBINARY, Value: []byte("aa")}, {Type: querypb.Type_VARBINARY, Value: []byte("bb")}}},
"Type_TUPLE,5 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_VARBINARY, Value: []byte("aa")}, {Type: querypb.Type_INT64, Value: []byte("1")}}},
}
fullBindParams := true
remoteAddr, username := stats.RemoteAddrUsername()
*streamlog.QueryLogFormat = streamlog.QueryLogFormatJSON
*streamlog.QueryLogJSONV2 = true
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
var w bytes.Buffer
log := logstats.NewLogger()
log.Init(*streamlog.QueryLogFormat == streamlog.QueryLogFormatJSON)
log.Key("Method")
log.StringUnquoted(stats.Method)
log.Key("RemoteAddr")
log.StringUnquoted(remoteAddr)
log.Key("Username")
log.StringUnquoted(username)
log.Key("ImmediateCaller")
log.StringSingleQuoted(stats.ImmediateCaller())
log.Key("Effective Caller")
log.StringSingleQuoted(stats.EffectiveCaller())
log.Key("Start")
log.Time(stats.StartTime)
log.Key("End")
log.Time(stats.EndTime)
log.Key("TotalTime")
log.Duration(stats.TotalTime())
log.Key("PlanTime")
log.Duration(stats.PlanTime)
log.Key("ExecuteTime")
log.Duration(stats.ExecuteTime)
log.Key("CommitTime")
log.Duration(stats.CommitTime)
log.Key("StmtType")
log.StringUnquoted(stats.StmtType)
log.Key("SQL")
log.String(stats.SQL)
log.Key("BindVars")
if *streamlog.QueryLogJSONV2 {
log.BindVariablesV2(stats.BindVariables, fullBindParams)
} else {
log.BindVariables(stats.BindVariables, fullBindParams)
}
log.Key("ShardQueries")
log.Uint(stats.ShardQueries)
log.Key("RowsAffected")
log.Uint(stats.RowsAffected)
log.Key("Error")
log.String(stats.ErrorStr())
log.Key("TabletType")
log.String(stats.TabletType)
log.Key("SessionUUID")
log.String(stats.SessionUUID)
log.Key("InTransaction")
log.Bool(stats.InTransaction)
log.Key("Table")
log.String(stats.Table)
log.Key("Keyspace")
log.String(stats.Keyspace)
log.Flush(&w)
}
})
} old benchimport (
"bytes"
"context"
"fmt"
"testing"
"time"
"vitess.io/vitess/go/sqltypes"
"vitess.io/vitess/go/streamlog"
querypb "vitess.io/vitess/go/vt/proto/query"
"vitess.io/vitess/go/vt/vtgate"
)
func BenchmarkLogStatsFormat(b *testing.B) {
b.ReportAllocs()
stats := vtgate.NewLogStats(context.Background(), "test", "sql1", "suuid", nil)
stats.StartTime = time.Date(2017, time.January, 1, 1, 2, 3, 0, time.UTC)
stats.EndTime = time.Date(2017, time.January, 1, 1, 2, 4, 1234, time.UTC)
stats.Table = "ks1.tbl1,ks2.tbl2"
stats.TabletType = "MASTER"
stats.Keyspace = "db"
stats.BindVariables = map[string]*querypb.BindVariable{
"Type_BINARY, ": {Type: querypb.Type_BINARY, Value: []byte("a")},
"Type_BIT, ": {Type: querypb.Type_BIT, Value: []byte("1")},
"Type_BLOB, ": {Type: querypb.Type_BLOB, Value: []byte("a")},
"Type_CHAR, ": {Type: querypb.Type_CHAR, Value: []byte("a")},
"Type_DATE, ": {Type: querypb.Type_DATE, Value: []byte("2012-02-24")},
"Type_DATETIME,": {Type: querypb.Type_DATETIME, Value: []byte("2012-02-24 23:19:43")},
"Type_DECIMAL, ": {Type: querypb.Type_DECIMAL, Value: []byte("1.00")},
"Type_ENUM, ": {Type: querypb.Type_ENUM, Value: []byte("a")},
"Type_FLOAT32, ": {Type: querypb.Type_FLOAT32, Value: []byte("1.00")},
"Type_FLOAT64, ": {Type: querypb.Type_FLOAT64, Value: []byte("1.00")},
"Type_INT8, ": {Type: querypb.Type_INT8, Value: []byte("1")},
"Type_INT16, ": {Type: querypb.Type_INT16, Value: []byte("1")},
"Type_INT24, ": {Type: querypb.Type_INT24, Value: []byte("1")},
"Type_INT32, ": {Type: querypb.Type_INT32, Value: []byte("1")},
"Type_INT64, ": {Type: querypb.Type_INT64, Value: []byte("1")},
"Type_SET, ": {Type: querypb.Type_SET, Value: []byte("a")},
"Type_TEXT, ": {Type: querypb.Type_TEXT, Value: []byte("a")},
"Type_TIME, ": {Type: querypb.Type_TIME, Value: []byte("23:19:43")},
"Type_TIMESTAMP": {Type: querypb.Type_TIMESTAMP, Value: []byte("2012-02-24 23:19:43")},
"Type_UINT8, ": {Type: querypb.Type_UINT8, Value: []byte("1")},
"Type_UINT16, ": {Type: querypb.Type_UINT16, Value: []byte("1")},
"Type_UINT24, ": {Type: querypb.Type_UINT24, Value: []byte("1")},
"Type_UINT32, ": {Type: querypb.Type_UINT32, Value: []byte("1")},
"Type_UINT64, ": {Type: querypb.Type_UINT64, Value: []byte("1")},
"Type_VARBINARY": {Type: querypb.Type_VARBINARY, Value: []byte("a")},
"Type_VARCHAR, ": {Type: querypb.Type_VARCHAR, Value: []byte("a")},
"Type_YEAR, ": {Type: querypb.Type_YEAR, Value: []byte("1")},
"Type_TUPLE,1 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_FLOAT64, Value: []byte("1")}, {Type: querypb.Type_FLOAT64, Value: []byte("2")}}},
"Type_TUPLE,2 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_INT64, Value: []byte("1")}, {Type: querypb.Type_INT64, Value: []byte("2")}}},
"Type_TUPLE,3 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_UINT64, Value: []byte("1")}, {Type: querypb.Type_UINT64, Value: []byte("2")}}},
"Type_TUPLE,4 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_VARBINARY, Value: []byte("aa")}, {Type: querypb.Type_VARBINARY, Value: []byte("bb")}}},
"Type_TUPLE,5 ": {Type: querypb.Type_TUPLE, Values: []*querypb.Value{{Type: querypb.Type_VARBINARY, Value: []byte("aa")}, {Type: querypb.Type_INT64, Value: []byte("1")}}},
}
fullBindParams := true
remoteAddr, username := stats.RemoteAddrUsername()
*streamlog.QueryLogFormat = streamlog.QueryLogFormatJSON
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
var w bytes.Buffer
formattedBindVars := sqltypes.FormatBindVariables(
stats.BindVariables,
fullBindParams,
*streamlog.QueryLogFormat == streamlog.QueryLogFormatJSON,
)
var fmtString string
switch *streamlog.QueryLogFormat {
case streamlog.QueryLogFormatText:
fmtString = "%v\t%v\t%v\t'%v'\t'%v'\t%v\t%v\t%.6f\t%.6f\t%.6f\t%.6f\t%v\t%q\t%v\t%v\t%v\t%q\t%q\t%q\t%q\t%t\t%q\t\n"
case streamlog.QueryLogFormatJSON:
fmtString = "{\"Method\": %q, \"RemoteAddr\": %q, \"Username\": %q, \"ImmediateCaller\": %q, \"Effective Caller\": %q, \"Start\": \"%v\", \"End\": \"%v\", \"TotalTime\": %.6f, \"PlanTime\": %v, \"ExecuteTime\": %v, \"CommitTime\": %v, \"StmtType\": %q, \"SQL\": %q, \"BindVars\": %v, \"ShardQueries\": %v, \"RowsAffected\": %v, \"Error\": %q, \"Keyspace\": %q, \"Table\": %q, \"TabletType\": %q, \"InTransaction\": %t, \"SessionUUID\": %q}\n"
}
fmt.Fprintf(
&w,
fmtString,
stats.Method,
remoteAddr,
username,
stats.ImmediateCaller(),
stats.EffectiveCaller(),
stats.StartTime.Format("2006-01-02 15:04:05.000000"),
stats.EndTime.Format("2006-01-02 15:04:05.000000"),
stats.TotalTime().Seconds(),
stats.PlanTime.Seconds(),
stats.ExecuteTime.Seconds(),
stats.CommitTime.Seconds(),
stats.StmtType,
stats.SQL,
formattedBindVars,
stats.ShardQueries,
stats.RowsAffected,
stats.ErrorStr(),
stats.Keyspace,
stats.Table,
stats.TabletType,
stats.InTransaction,
stats.SessionUUID,
)
}
})
}
Basically, only JSON string values add allocations. Tuples do not. |
Signed-off-by: Vilius Okockis <[email protected]>
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
This PR was closed because it has been stale for 7 days with no activity. |
Description
This PR adds a new version 2.0 format for
--querylog-format=json
. This new format is opt-in with the flag--querylog-json-v2
, however I propose the new format replaces the current format in later releasesThe motivation is to resolve cases where the current format produces invalid JSON caused by certain bind variables values that break
json
's spec (bug confirmed/identified by Vinted and Slack in prod - Slack 🧵)The version 2.0 format resolves the bind-variable breakage by letting
encoding/json
library marshal the[]byte
value as base64, which is default behaviour in the library. The original implementation builds json manually usingfmt.Sprintf(...)
, printing[]byte
as%q
, which explains how we got here. The new format is a breaking change to users of--querylog-format=json
, however, which is why I added this behind a flagBreaking changes:
encoding/json
)encoding/json
+ unmarshal using theLogStatsJSON
struct (or the value field only as[]byte
) won't need to know it was base64Effective Caller
field renamed toEffectiveCaller
, to be consistent withImmediateCaller
and other camel-cased fieldsStart
andEnd
renamed toStartTime
andEndTime
respectively, for clarity and to be consistent with the otherTime
-suffixed vars (eg:PlanTime
/ExecuteTime
)CachedPlan
key/value added to newvttablet
formatnil
/""
they are not added to the json (viajson:",omitempty"
)The existing
text
andjson
(without--querylog-json-v2
flag) formats should be unaffected by this changeExample
vtgate
log (pretty-printed, without redaction):Example
vttablet
log (pretty-printed, without redaction):Related Issue(s)
Checklist
Deployment Notes