-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add parameters for SBD watchdog and msgwait timeouts #62
base: main
Are you sure you want to change the base?
Conversation
- rename _manage_multiple_sbd to _manage_sbd as multiple devices can now be managed by crmsh * remove unneeded workaround * add new workaround for timeouts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yeoldegrove ,
First of all, thank you for this contribution. It is really appreciated, more if this changes are helpful for production setups.
Anyway, I have some topics that I would like to discuss. Everything listed in the comments.
Something important, and I personally know, is that is the sbd
command changes the sbd configuration file (/etc/sysconfig/sbd
) with this new parameters, or if this doesn't even matter.
PD: @diegoakechi Can we confirm that the usage of multiple sbd disks is available and backported in SLE12/15 versions?
@@ -313,7 +315,11 @@ def _crm_init( | |||
if quiet: | |||
cmd = '{cmd} -q'.format(cmd=cmd) | |||
|
|||
return __salt__['cmd.retcode'](cmd) | |||
return_code = __salt__['cmd.retcode'](cmd) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks quite dangerous. I'm wondering why crm cluster init
should return an error code if we don't change the call. Do you mean for versions where multiple sbd devices are not supported?
What do you mean with your next comment?
sbd timeouts are not supported in any version
return_code = __salt__['cmd.retcode'](cmd) | ||
# Workaround as long as setting sbd timeouts is not supported by "crm cluster init" | ||
if not return_code and sbd: | ||
_manage_sbd(sbd, sbd_dev, sbd_timeout_watchdog, sbd_timeout_msgwait) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we should replace this variables by a kwargs item. How many more timeouts or options are available in sbd? If we want to add more options in the future having kwargs would make sense
@@ -349,45 +357,32 @@ def _ha_cluster_init( | |||
name = __salt__['network.get_hostname']() | |||
addr = __salt__['network.interface_ip'](interface or 'eth0') | |||
_set_corosync_unicast(addr, name) | |||
# Workaround as long as setting sbd timeouts is not supported by "ha-cluster-init" | |||
if not return_code and sbd: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same than before. I don't understand in which case this would return an error code
if not sbd_enabled or not sbd_dev or len(sbd_dev) == 1: | ||
return sbd_enabled, sbd_dev | ||
if not sbd_enabled or not sbd_dev or not sbd_timeout_watchdog or not sbd_timeout_msgwait: | ||
return sbd_enabled, sbd_dev, sbd_timeout_watchdog, sbd_timeout_msgwait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to return all of these values. They were used just because sbd_dev
was converted to a list if a string was received. Actually, you never use the return values on the calls
|
||
sbd_str = ' '.join(['-d {}'.format(sbd) for sbd in sbd_dev]) | ||
cmd = 'sbd {disks} create'.format(disks=sbd_str) | ||
cmd = 'sbd {disks} create -1 {sbd_timeout_watchdog} -4 {sbd_timeout_msgwait}'.format(disks=sbd_str, sbd_timeout_watchdog=sbd_timeout_watchdog, sbd_timeout_msgwait=sbd_timeout_msgwait) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are assuming that we are going to receive something always, What if the timeouts are set to None
(None are the module defaults in fact).?
We need to generated the string based on that. If the values is different than None, append the new parameters.
I noticed that the default values for watchdog (5) and msgwait (10) timeouts for SBD device creation are used. This is not feasible for production setups.
These commits add parameters for SBD watchdog and msgwait timeouts.
It has to be implemented in kind of a hacky way by recreating the sbd devices after
crm cluster init
created them initially. This is due to the limitations ofcrm cluster init
."_manage_multiple_sbd" was renamed to "_manage_sbd" as multiple devices can now be managed by crmsh.
I tested this in setup with a single and with multiple SBD devices. Also the new parameters are optional and sane defaults will be used.