Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repairReplication deadlock fix #177

Merged
merged 10 commits into from
Jan 27, 2024
Merged

repairReplication deadlock fix #177

merged 10 commits into from
Jan 27, 2024

Conversation

vmogilev
Copy link

@vmogilev vmogilev commented Jan 17, 2024

Description

This PR fixes slow PRS (17-18s hangs) bug caused by repairReplication causing a shard deadlock as described here: https://slack-pde.slack.com/archives/C8EJ0PTPF/p1705042056083619?thread_ts=1696929303.041269&cid=C8EJ0PTPF

Testing

Extensively tested over 2 week period here and here.

Backport/Upstream Plans

see this thread

Rollout

After making a new build off of slack-vitess-r14.0.5:

  1. soak the new build in dev ( all keyspaces )
  2. wait until vtgate v14 rollout is 100% completed to avoid adding any new variables
  3. vttablet canary phase in prod using vtops-go upgrade plan
  4. vttablet default build in prod rollout via u22/high-uptime recycling

@vmogilev vmogilev marked this pull request as ready for review January 22, 2024 19:16
@vmogilev vmogilev requested a review from a team as a code owner January 22, 2024 19:16
@vmogilev vmogilev merged commit a81a245 into slack-vitess-r14.0.5 Jan 27, 2024
241 checks passed
@vmogilev vmogilev deleted the vm_debug-prs branch January 27, 2024 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants