Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VDiffs on reverse Reshard workflows never start #14400

Closed
rohit-nayak-ps opened this issue Oct 31, 2023 · 0 comments · Fixed by #14413
Closed

VDiffs on reverse Reshard workflows never start #14400

rohit-nayak-ps opened this issue Oct 31, 2023 · 0 comments · Fixed by #14413

Comments

@rohit-nayak-ps
Copy link
Contributor

rohit-nayak-ps commented Oct 31, 2023

Overview of the Issue

VDiffs that are created for reverse workflows do not start because they cannot find any tablets to stream from on the original source:

  • In VReplication: Make Source Tablet Selection More Robust #13582 we changed the tablet picker to only pick tablets which were Serving.
  • New shards are created with IsPrimaryServing=true. So VDiffs for forward workflows work since the picker does pick tablets from the new shards.
  • However when we switch primary traffic we set IsPrimaryServing=false for the shards we are switching away from. This means that the picker for reverse workflows does not find any tablets so source the target streams from.

Note that the reverse workflow itself is still running properly.

Found while debugging failed vdiffs for the reverse workflows in #14327

Reproduction Steps

  • Run local example upto and including ./305_switch_writes.sh.
  • Start a vdiff for the reverse workflow with
    vtctlclient VDiff -- --tablet_types=replica customer.cust2cust_reverse create
  • Check that it doesn't progress using vtctlclient VDiff customer.cust2cust show last

Binary Version

vttablet version Version: 19.0.0-SNAPSHOT (Git revision eddb7da44636994d1064906ca20d06543e6044a0 branch 'main') built on Tue Oct 31 11:48:06 CET 2023 by [email protected] using go1.21.0 darwin/arm64

Operating System and Environment details

Main: eddb7da44636994d1064906ca20d06543e6044a0

Log Fragments

The logs for the picked tablet in reverse workflow show unhealthy tablets:

rI1031 11:45:34.844917   98871 tablet_picker.go:441] Connecting to tablet for tablet picker: alias:{cell:"zone1" uid:201} hostname:"rss-laptop.home" port_map:{key:"grpc" value:16201} port_map:{key:"vt" value:15201} keyspace:"customer" shard:"0" type:REPLICA mysql_hostname:"rss-laptop.home" mysql_port:17201 default_conn_collation:255
I1031 11:45:34.845345   98871 tablet_picker.go:446] Checking tablet health for tablet picker: alias:{cell:"zone1" uid:201} hostname:"rss-laptop.home" port_map:{key:"grpc" value:16201} port_map:{key:"vt" value:15201} keyspace:"customer" shard:"0" type:REPLICA mysql_hostname:"rss-laptop.home" mysql_port:17201 default_conn_collation:255
I1031 11:45:34.860529   98871 tablet_picker.go:448] Got health response for tablet picker: alias:{cell:"zone1" uid:201} hostname:"rss-laptop.home" port_map:{key:"grpc" value:16201} port_map:{key:"vt" value:15201} keyspace:"customer" shard:"0" type:REPLICA mysql_hostname:"rss-laptop.home" mysql_port:17201 default_conn_collation:255, shr target:{keyspace:"customer" shard:"0" tablet_type:REPLICA} realtime_stats:{replication_lag_seconds:138} tablet_alias:{cell:"zone1" uid:201}
I1031 11:45:34.860686   98871 tablet_picker.go:453] Tablet picker found unhealthy tablet: alias:{cell:"zone1" uid:201} hostname:"rss-laptop.home" port_map:{key:"grpc" value:16201} port_map:{key:"vt" value:15201} keyspace:"customer" shard:"0" type:REPLICA mysql_hostname:"rss-laptop.home" mysql_port:17201 default_conn_collation:255

However for forward workflows we get a healthy tablet:
I1031 11:51:41.024681   12776 tablet_picker.go:441] Connecting to tablet for tablet picker: alias:{cell:"zone1" uid:302} hostname:"rss-laptop.home" port_map:{key:"grpc" value:16302} port_map:{key:"vt" value:15302} keyspace:"customer" shard:"-80" key_range:{end:"\x80"} type:RDONLY mysql_hostname:"rss-laptop.home" mysql_port:17302 default_conn_collation:255
I1031 11:51:41.024734   12776 tablet_picker.go:446] Checking tablet health for tablet picker: alias:{cell:"zone1" uid:302} hostname:"rss-laptop.home" port_map:{key:"grpc" value:16302} port_map:{key:"vt" value:15302} keyspace:"customer" shard:"-80" key_range:{end:"\x80"} type:RDONLY mysql_hostname:"rss-laptop.home" mysql_port:17302 default_conn_collation:255
I1031 11:51:41.029337   12776 tablet_picker.go:448] Got health response for tablet picker: alias:{cell:"zone1" uid:302} hostname:"rss-laptop.home" port_map:{key:"grpc" value:16302} port_map:{key:"vt" value:15302} keyspace:"customer" shard:"-80" key_range:{end:"\x80"} type:RDONLY mysql_hostname:"rss-laptop.home" mysql_port:17302 default_conn_collation:255, shr target:{keyspace:"customer" shard:"-80" tablet_type:RDONLY} serving:true realtime_stats:{replication_lag_seconds:80} tablet_alias:{cell:"zone1" uid:302}
I1031 11:51:41.029352   12776 tablet_picker.go:450] Tablet picker found healthy tablet: alias:{cell:"zone1" uid:302} hostname:"rss-laptop.home" port_map:{key:"grpc" value:16302} port_map:{key:"vt" value:15302} keyspace:"customer" shard:"-80" key_range:{end:"\x80"} type:RDONLY mysql_hostname:"rss-laptop.home" mysql_port:17302 default_conn_collation:255
I1031 11:51:41.029379   12776 tablet_picker.go:352] Tablet picker found a healthy serving tablet for streaming: alias:{cell:"zone1" uid:302} hostname:"rss-laptop.home" port_map:{key:"grpc" value:16302} port_map:{key:"vt" value:15302} keyspace:"customer" shard:"-80" key_range:{end:"\x80"} type:RDONLY mysql_hostname:"rss-laptop.home" mysql_port:17302 default_conn_collation:255
@rohit-nayak-ps rohit-nayak-ps self-assigned this Oct 31, 2023
@rohit-nayak-ps rohit-nayak-ps changed the title VDiffs on reverse workflows never start VDiffs on reverse Reshard workflows never start Oct 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant