Skip to content

Commit

Permalink
slack-vitess-r15.0.5: backport v16 vtorc fixes (#258)
Browse files Browse the repository at this point in the history
* VTOrc running PRS when database_instance empty bug fix. (vitessio#12019)

* feat: convert join with database_instance to a left join and prevent fixes from running if the information from database_instance is unavailable

Signed-off-by: Manan Gupta <[email protected]>

* test: add tests to verify the fix works

Signed-off-by: Manan Gupta <[email protected]>

Signed-off-by: Manan Gupta <[email protected]>

* Timeout Fixes and VTOrc Improvement (vitessio#11881)

* refactor: move tests out of newfeaturestest so that they run on upgrade-downgrade tests too

Signed-off-by: Manan Gupta <[email protected]>

* feat: add failing ers test for handling multiple vttablet failures with default values of flags

Signed-off-by: Manan Gupta <[email protected]>

* feat: add a new lock-timeout flag and use that instead of remote-operation-timeout

Signed-off-by: Manan Gupta <[email protected]>

* feat: augment DownPrimary test to reproduce the issue of VTOrc not handling multiple failures

Signed-off-by: Manan Gupta <[email protected]>

* feat: remove LockShardTimeout configuration from VTOrc and add parallelism to refresh of tablets

Signed-off-by: Manan Gupta <[email protected]>

* log: add more logging lines around ers in vtorc

Signed-off-by: Manan Gupta <[email protected]>

* test: get the test to work

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix usage of wait for replicas timeout

Signed-off-by: Manan Gupta <[email protected]>

* test: fix flags expected output

Signed-off-by: Manan Gupta <[email protected]>

* test: fix race in test now that the function is called in parallel multiple times

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix default of onCloseTimeout to 1 second

Signed-off-by: Manan Gupta <[email protected]>

* test: add failing unit test to refreshTabletsInKeyspaceShard

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix vtorc to not forget a tablet which has been deleted

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix backward compatibility, add tests and release notes docs

Signed-off-by: Manan Gupta <[email protected]>

* test: fix flags output

Signed-off-by: Manan Gupta <[email protected]>

* test: use disable-replication-manager instead of disable-active-reparents to allow vttablets to setup replication when restarted

Signed-off-by: Manan Gupta <[email protected]>

* test: fix flaky test by not checking for an error

Signed-off-by: Manan Gupta <[email protected]>

* feat: handle the case of empty hostname in tablet initialization

Signed-off-by: Manan Gupta <[email protected]>

* feat: update onclose timeout to 10 seconds

Signed-off-by: Manan Gupta <[email protected]>

* test: fix unit test

Signed-off-by: Manan Gupta <[email protected]>

* feat: address review comments

Signed-off-by: Manan Gupta <[email protected]>

* docs: add comments explaining the test functions

Signed-off-by: Manan Gupta <[email protected]>

* feat: add summary docs for 'lock-shard-timeout' deprecation

Signed-off-by: Manan Gupta <[email protected]>

Signed-off-by: Manan Gupta <[email protected]>

* log: also log error in DiscoverInstance when force discovery is specified (vitessio#11936)

Signed-off-by: Manan Gupta <[email protected]>

Signed-off-by: Manan Gupta <[email protected]>

* VTOrc Code Cleanup - generate_base, replace cluster_name with keyspace and shard. (vitessio#12012)

* feat: refactor generate commands of VTOrc to be in a single file

Signed-off-by: Manan Gupta <[email protected]>

* refactor: cleanup create table formatting

Signed-off-by: Manan Gupta <[email protected]>

* feat: cleanup the usage of IsSQLite and IsMySQL

Signed-off-by: Manan Gupta <[email protected]>

* feat: remove unused minimal instance

Signed-off-by: Manan Gupta <[email protected]>

* feat: remove unused table cluster_domain_name

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix vtorc database to store keyspace and shard instead of cluster

Signed-off-by: Manan Gupta <[email protected]>

* feat: remove unused attributes

Signed-off-by: Manan Gupta <[email protected]>

* feat: remove unused cluster domain

Signed-off-by: Manan Gupta <[email protected]>

* feat: change GetClusterName to GetKeyspaceAndShardName

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix insertion into database_instance

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix SnapshotTopologies

Signed-off-by: Manan Gupta <[email protected]>

* feat: remove inject unseen primary and inject seed

Signed-off-by: Manan Gupta <[email protected]>

* feat: remove ClusterName from Instance

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix Audit operations

Signed-off-by: Manan Gupta <[email protected]>

* feat: add Keyspace and Shard to cluster information to replace ClusterName

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix attempt failure detection registeration

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix blocked topology recoveries

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix topology recovery

Signed-off-by: Manan Gupta <[email protected]>

* feat: reading recovery instances

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix get replication and analysis

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix bug in query

Signed-off-by: Manan Gupta <[email protected]>

* test: add tests to check that filtering by keyspace works for APIs

Signed-off-by: Manan Gupta <[email protected]>

* feat: remove remaining usages of ClusterName

Signed-off-by: Manan Gupta <[email protected]>

* refactor: fix comment explaining sleep in the test

Signed-off-by: Manan Gupta <[email protected]>

* feat: add code to prevent filtering just by shard and add tests for it

Signed-off-by: Manan Gupta <[email protected]>

Signed-off-by: Manan Gupta <[email protected]>

* Fix insert query of blocked_recovery table in VTOrc (vitessio#12091)

* feat: add failing test and fix the query of insertion

Signed-off-by: Manan Gupta <[email protected]>

* empty-commit

Signed-off-by: Manan Gupta <[email protected]>

Signed-off-by: Manan Gupta <[email protected]>

* Fix: VTOrc forgetting old instances (vitessio#12089)

* test: add a failing test for the case where the port changes for a tablet

Signed-off-by: Manan Gupta <[email protected]>

* feat: fix the issue by adding alias as a unique field

Signed-off-by: Manan Gupta <[email protected]>

* empty-commit

Signed-off-by: Manan Gupta <[email protected]>

Signed-off-by: Manan Gupta <[email protected]>

* Move vtorc from go-sqlite3 to modernc.org/sqlite (vitessio#12214)

* Move vtorc from go-sqlite3 to modernc.org/sqlite

This moves vtorc from the go-sqlite3 library that uses CGO, to use
modernc.org/sqlite which is a pure Go implementation.

vtorc is the only component we have to build with CGO but it's causing
pain for releases since we need to build it against an old Linux for
linking against glibc.

Using modernc.org/sqlite allows for using Go only again and makes all
Vitess components buildable without CGO.

In
https://datastation.multiprocess.io/blog/2022-05-12-sqlite-in-go-with-and-without-cgo.html
someone ran some basic benchmarks. It shows that the pure Go version can
be twice as slow, but the usage of vtorc is very limited and we operate
on small datasets, so I think the performance impact purely of a
somewhat slower sqlite implementation is negligable.

None of this is in a hot query serving path or anything like that, so I
have little concern performance wise.

Signed-off-by: Dirkjan Bussink <[email protected]>

* Fix error handling in RowToArray

Signed-off-by: Dirkjan Bussink <[email protected]>

---------

Signed-off-by: Dirkjan Bussink <[email protected]>

* see if CI passes on v14.0.5 as previous release

Signed-off-by: Tim Vaillancourt <[email protected]>

* Revert "see if CI passes on v14.0.5 as previous release"

This reverts commit 53a0e0c.

---------

Signed-off-by: Manan Gupta <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Manan Gupta <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]>
  • Loading branch information
3 people authored May 16, 2024
1 parent bde4622 commit 36d8315
Show file tree
Hide file tree
Showing 47 changed files with 1,630 additions and 1,880 deletions.
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ require (
github.com/klauspost/compress v1.13.0
github.com/klauspost/pgzip v1.2.4
github.com/krishicks/yaml-patch v0.0.10
github.com/magiconair/properties v1.8.5
github.com/magiconair/properties v1.8.6
github.com/mattn/go-sqlite3 v1.14.16 // indirect
github.com/minio/minio-go v0.0.0-20190131015406-c8a261de75c1
github.com/mitchellh/go-testing-interface v1.14.0 // indirect
Expand Down Expand Up @@ -200,7 +200,7 @@ require (
modernc.org/ccgo/v3 v3.16.13 // indirect
modernc.org/libc v1.22.2 // indirect
modernc.org/mathutil v1.5.0 // indirect
modernc.org/memory v1.4.0 // indirect
modernc.org/memory v1.5.0 // indirect
modernc.org/opt v0.1.3 // indirect
modernc.org/strutil v1.1.3 // indirect
modernc.org/token v1.0.1 // indirect
Expand Down
7 changes: 4 additions & 3 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -507,8 +507,9 @@ github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/krishicks/yaml-patch v0.0.10 h1:H4FcHpnNwVmw8u0MjPRjWyIXtco6zM2F78t+57oNM3E=
github.com/krishicks/yaml-patch v0.0.10/go.mod h1:Sm5TchwZS6sm7RJoyg87tzxm2ZcKzdRE4Q7TjNhPrME=
github.com/magiconair/properties v1.8.0/go.mod h1:PppfXfuXeibc/6YijjN8zIbojt8czPbwD3XqdrwzmxQ=
github.com/magiconair/properties v1.8.5 h1:b6kJs+EmPFMYGkow9GiUyCyOvIwYetYJ3fSaWak/Gls=
github.com/magiconair/properties v1.8.5/go.mod h1:y3VJvCyxH9uVvJTWEGAELF3aiYNyPKd5NZ3oSwXrF60=
github.com/magiconair/properties v1.8.6 h1:5ibWZ6iY0NctNGWo87LalDlEZ6R41TqbbDamhfG/Qzo=
github.com/magiconair/properties v1.8.6/go.mod h1:y3VJvCyxH9uVvJTWEGAELF3aiYNyPKd5NZ3oSwXrF60=
github.com/mailru/easyjson v0.0.0-20160728113105-d5b7844b561a/go.mod h1:C1wdFJiN94OJF2b5HbByQZoLdCWB1Yqtg26g4irojpc=
github.com/mailru/easyjson v0.0.0-20180823135443-60711f1a8329/go.mod h1:C1wdFJiN94OJF2b5HbByQZoLdCWB1Yqtg26g4irojpc=
github.com/mailru/easyjson v0.0.0-20190312143242-1de009706dbe/go.mod h1:C1wdFJiN94OJF2b5HbByQZoLdCWB1Yqtg26g4irojpc=
Expand Down Expand Up @@ -1312,8 +1313,8 @@ modernc.org/libc v1.22.2 h1:4U7v51GyhlWqQmwCHj28Rdq2Yzwk55ovjFrdPjs8Hb0=
modernc.org/libc v1.22.2/go.mod h1:uvQavJ1pZ0hIoC/jfqNoMLURIMhKzINIWypNM17puug=
modernc.org/mathutil v1.5.0 h1:rV0Ko/6SfM+8G+yKiyI830l3Wuz1zRutdslNoQ0kfiQ=
modernc.org/mathutil v1.5.0/go.mod h1:mZW8CKdRPY1v87qxC/wUdX5O1qDzXMP5TH3wjfpga6E=
modernc.org/memory v1.4.0 h1:crykUfNSnMAXaOJnnxcSzbUGMqkLWjklJKkBK2nwZwk=
modernc.org/memory v1.4.0/go.mod h1:PkUhL0Mugw21sHPeskwZW4D6VscE/GQJOnIpCnW6pSU=
modernc.org/memory v1.5.0 h1:N+/8c5rE6EqugZwHii4IFsaJ7MUhoWX07J5tC/iI5Ds=
modernc.org/memory v1.5.0/go.mod h1:PkUhL0Mugw21sHPeskwZW4D6VscE/GQJOnIpCnW6pSU=
modernc.org/opt v0.1.3 h1:3XOZf2yznlhC+ibLltsDGzABUGVx8J6pnFMS3E4dcq4=
modernc.org/opt v0.1.3/go.mod h1:WdSiB5evDcignE70guQKxYUl14mgWtbClRi5wmkkTX0=
modernc.org/sqlite v1.20.3 h1:SqGJMMxjj1PHusLxdYxeQSodg7Jxn9WWkaAQjKrntZs=
Expand Down
2 changes: 1 addition & 1 deletion go/cmd/vtorc/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ import (
// addStatusParts adds UI parts to the /debug/status page of VTOrc
func addStatusParts() {
servenv.AddStatusPart("Recent Recoveries", logic.TopologyRecoveriesTemplate, func() any {
recoveries, _ := logic.ReadRecentRecoveries("", false, 0)
recoveries, _ := logic.ReadRecentRecoveries(false, 0)
return recoveries
})
}
3 changes: 2 additions & 1 deletion go/flags/endtoend/vtbackup.txt
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ Usage of vtbackup:
--keep-alive-timeout duration Wait until timeout elapses after a successful backup before shutting down.
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
Expand Down Expand Up @@ -122,7 +123,7 @@ Usage of vtbackup:
--port int port for the server
--pprof strings enable profiling
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--restart_before_backup Perform a mysqld clean/full restart after applying binlogs, but before taking the backup. Only makes sense to work around xtrabackup bugs.
--s3_backup_aws_endpoint string endpoint of the S3 backend (region must be provided).
--s3_backup_aws_region string AWS region to use. (default "us-east-1")
Expand Down
5 changes: 3 additions & 2 deletions go/flags/endtoend/vtctld.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Usage of vtctld:
--action_timeout duration time to wait for an action before resorting to force (default 2m0s)
--action_timeout duration time to wait for an action before resorting to force (default 1m0s)
--alsologtostderr log to standard error as well as files
--azblob_backup_account_key_file string Path to a file containing the Azure Storage account key; if this flag is unset, the environment variable VT_AZBLOB_ACCOUNT_KEY will be used as the key itself (NOT a file path).
--azblob_backup_account_name string Azure Storage Account name for backups; if this flag is unset, the environment variable VT_AZBLOB_ACCOUNT_NAME will be used.
Expand Down Expand Up @@ -61,6 +61,7 @@ Usage of vtctld:
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
Expand All @@ -74,7 +75,7 @@ Usage of vtctld:
--pprof strings enable profiling
--proxy_tablets Setting this true will make vtctld proxy the tablet status instead of redirecting to them
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--s3_backup_aws_endpoint string endpoint of the S3 backend (region must be provided).
--s3_backup_aws_region string AWS region to use. (default "us-east-1")
--s3_backup_aws_retries int AWS request retries. (default -1)
Expand Down
3 changes: 2 additions & 1 deletion go/flags/endtoend/vtgate.txt
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ Usage of vtgate:
--keyspaces_to_watch strings Specifies which keyspaces this vtgate should have access to while routing queries or accessing the vschema.
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--legacy_replication_lag_algorithm Use the legacy algorithm when selecting vttablets for serving. (default true)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--lock_heartbeat_time duration If there is lock function used. This will keep the lock connection active by using this heartbeat (default 5s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
Expand Down Expand Up @@ -134,7 +135,7 @@ Usage of vtgate:
--querylog-format string format for query logs ("text" or "json") (default "text")
--querylog-row-threshold uint Number of rows a query has to return or affect before being logged; not useful for streaming queries. 0 means all queries will be logged.
--redact-debug-ui-queries redact full queries and bind variables from debug UI
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--retry-count int retry count (default 2)
--schema_change_signal Enable the schema tracker; requires queryserver-config-schema-change-signal to be enabled on the underlying vttablets for this to work (default true)
--schema_change_signal_user string User to be used to send down query to vttablet to retrieve schema changes
Expand Down
3 changes: 2 additions & 1 deletion go/flags/endtoend/vtgr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Usage of vtgr:
-h, --help display usage and exit
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
Expand All @@ -31,7 +32,7 @@ Usage of vtgr:
--pprof strings enable profiling
--purge_logs_interval duration how often try to remove old logs (default 1h0m0s)
--refresh_interval duration Refresh interval to load tablets. (default 10s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--scan_interval duration Scan interval to diagnose and repair. (default 3s)
--scan_repair_timeout duration Time to wait for a Diagnose and repair operation. (default 3s)
--security_policy string the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only)
Expand Down
4 changes: 2 additions & 2 deletions go/flags/endtoend/vtorc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Usage of vtorc:
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--lock-shard-timeout duration Duration for which a shard lock is held when running a recovery (default 30s)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_err_stacks log stack traces for errors
Expand All @@ -39,7 +39,7 @@ Usage of vtorc:
--reasonable-replication-lag duration Maximum replication lag on replicas which is deemed to be acceptable (default 10s)
--recovery-period-block-duration duration Duration for which a new recovery is blocked on an instance after running a recovery (default 30s)
--recovery-poll-duration duration Timer duration on which VTOrc polls its database to run a recovery (default 1s)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--security_policy string the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only)
--shutdown_wait_time duration Maximum time to wait for VTOrc to release all the locks that it is holding before shutting down on SIGTERM (default 30s)
--snapshot-topology-interval duration Timer duration on which VTOrc takes a snapshot of the current MySQL information it has in the database. Should be in multiple of hours
Expand Down
3 changes: 2 additions & 1 deletion go/flags/endtoend/vttablet.txt
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ Usage of vttablet:
--keep_logs duration keep logs for this long (using ctime) (zero to keep forever)
--keep_logs_by_mtime duration keep logs for this long (using mtime) (zero to keep forever)
--lameduck-period duration keep running at least this long after SIGTERM before stopping (default 50ms)
--lock-timeout duration Maximum time for which a shard/keyspace lock can be acquired for (default 45s)
--lock_tables_timeout duration How long to keep the table locked before timing out (default 1m0s)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
Expand Down Expand Up @@ -241,7 +242,7 @@ Usage of vttablet:
--redact-debug-ui-queries redact full queries and bind variables from debug UI
--relay_log_max_items int Maximum number of rows for VReplication target buffering. (default 5000)
--relay_log_max_size int Maximum buffer size (in bytes) for VReplication target buffering. If single rows are larger than this, a single row is buffered at a time. (default 250000)
--remote_operation_timeout duration time to wait for a remote operation (default 30s)
--remote_operation_timeout duration time to wait for a remote operation (default 15s)
--replication_connect_retry duration how long to wait in between replica reconnect attempts. Only precise to the second. (default 10s)
--restore_concurrency int (init restore parameter) how many concurrent files to restore at once (default 4)
--restore_from_backup (init restore parameter) will check BackupStorage for a recent backup at startup and start there
Expand Down
11 changes: 11 additions & 0 deletions go/internal/flag/flag.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,17 @@ func Parse(fs *flag.FlagSet) {
flag.Parse()
}

// IsFlagProvided returns if the given flag has been provided by the user explicitly or not
func IsFlagProvided(name string) bool {
found := false
flag.Visit(func(f *flag.Flag) {
if f.Name == name {
found = true
}
})
return found
}

// TrickGlog tricks glog into understanding that flags have been parsed.
//
// N.B. Do not delete this function. `glog` is a persnickity package and wants
Expand Down
Loading

0 comments on commit 36d8315

Please sign in to comment.