-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable DR when a cluster is not responsive #1139
Comments
Testing non-responsive cluster flow using #1133 Steps:
Actual resultDeleting drpc stuckIn ramen hub logs we see:
ManagedClusterViewsWe don't have any visibility on the cluster status in dr1 - we simply see the last reported status.
On dr2 we see an error condition trying to upload data to s3 store on dr1
DRClusterThere is no visibility on cluster status in drclusters:
|
So far we tested disable DR when both primary and secondary clusters are up. In disaster use case we may need to disable DR when the one of the clusters is not responsive. In this case we may not be able to clean up a cluster or even get the status of the cluster using
ManagedClusterView
.Simulating non responsive cluster is easy with virsh:
virsh -c qemu:///system suspend dr1
Recover a cluster:
Tested during failover, suspend cluster before failover, resume after application running on the failover cluster.
Fix
Support marking a drcluster as unavailable. When cluster is unavailable:
Recommended flow
Alternative flow
It the user will forget to mark a cluster as unavailable before disabling DR, disable dr will be stuck:
Marking the cluster as unavailable should fix the issue but may require more manual work.
Issues:
Tasks
Similar k8s flows:
The text was updated successfully, but these errors were encountered: