-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] High: fenced: Registration of a STONITH device without placement constraints fails. #3849
Conversation
@HideoYamauchi What do you mean by placement constraint? A location constraint (in the |
Sorry.
yes. It seems that the problem occurs if you do not specify a location rule (placement score). *Since there are no issues up to 2.1.9, I think this is some kind of problem with the state transition calculation. Thinking about it a bit more... With just this fix, I don't think it will work well when -INFINITY constraints are mixed in. ---> I checked a bit and it seems that including the -INFINITY location constraint doesn't cause any problems.
Best Regards, |
When you don't see the fence-resource started would a fencing action like off or reboot work? |
Thank you for your comment. The problem here is the registration of stonith devices with fenced. I know I'm repeating myself, but this problem is probably a regression. Best Regards, |
Yep that answers my question. Seems to be an issue when fenced is calling the scheduler-code not when it is called in the context of the scheduler-process (maybe there as well - but maybe worth checking if it is the context somehow). |
@HideoYamauchi Do you know if it would be possible to write a scheduler regression test to reproduce the failure? If so, we could use that to |
Thanks for your comment. If you're referring to CTS, I'm not familiar with the details of CTS. However, I think the problem can be reproduced by following the steps below. Step 1) Assuming that this is the initial cluster startup, clear everything in /var/lib/pacemaker/. Step 2) Configure the initial cluster with pcs cluster start --all.
fence_scsi.xml
Best Regards, |
By the way, this problem does not seem to occur if cib.xml already exists in /var/lib/pacemaker. Best Regards, |
Maybe a pointer in the direction that it is the context the scheduler code is run in. |
could be the empty I've been busy with other things and I have not personally looked at this in any detail yet. I just skimmed over the CIB that Hideo provided, along with the comment that the problem doesn't occur when Are you able to fill in the nodes section in the |
I agree with you that the cause is the timing and the presence or absence of content in the node section of cib.xml. Best Regards, |
A cib-change will immediately trigger fenced to react while a scheduler run may not be triggered right away ... |
If the cib.xml you import contains a node section, the cluster will be configured successfully.
Best Regards, |
Hi All, This is a significant issue for our users.
I will close this PR for now and submit another one. I understand the difference with the 3.0 series that is the problem. Many thanks, |
It will be closed for re-PR. |
Hi All,
Starting with Pacemaker 3.0.0, registration of STONITH devices without placement constraints is not performed, so unfencing of STONITH devices fails.
Pacemaker 2.1.9 and earlier did not have this problem, so it seems that some fix in the 3.0.0 series of Pacemaker is affecting it.
I have not tracked down the fix, so I will send a tentative fix. (It is probably a degradation of some process.)
I will leave it to the community to come up with a fix that takes the regression into account.
Best Regards,
Hideo Yamauchi.