add parameters for SBD watchdog and msgwait timeouts #62

yeoldegrove · 2020-05-13T15:37:52Z

I noticed that the default values for watchdog (5) and msgwait (10) timeouts for SBD device creation are used. This is not feasible for production setups.

These commits add parameters for SBD watchdog and msgwait timeouts.
It has to be implemented in kind of a hacky way by recreating the sbd devices after crm cluster init created them initially. This is due to the limitations of crm cluster init.

"_manage_multiple_sbd" was renamed to "_manage_sbd" as multiple devices can now be managed by crmsh.

I tested this in setup with a single and with multiple SBD devices. Also the new parameters are optional and sane defaults will be used.

- rename _manage_multiple_sbd to _manage_sbd as multiple devices can now be managed by crmsh * remove unneeded workaround * add new workaround for timeouts

arbulu89

Hi @yeoldegrove ,
First of all, thank you for this contribution. It is really appreciated, more if this changes are helpful for production setups.
Anyway, I have some topics that I would like to discuss. Everything listed in the comments.
Something important, and I personally know, is that is the sbd command changes the sbd configuration file (/etc/sysconfig/sbd) with this new parameters, or if this doesn't even matter.

PD: @diegoakechi Can we confirm that the usage of multiple sbd disks is available and backported in SLE12/15 versions?

arbulu89 · 2020-05-21T13:30:54Z

salt/modules/crmshmod.py

@@ -313,7 +315,11 @@ def _crm_init(
    if quiet:
        cmd = '{cmd} -q'.format(cmd=cmd)

-    return __salt__['cmd.retcode'](cmd)
+    return_code = __salt__['cmd.retcode'](cmd)


This looks quite dangerous. I'm wondering why crm cluster init should return an error code if we don't change the call. Do you mean for versions where multiple sbd devices are not supported?

What do you mean with your next comment?
sbd timeouts are not supported in any version

arbulu89 · 2020-05-21T13:32:51Z

salt/modules/crmshmod.py

+    return_code = __salt__['cmd.retcode'](cmd)
+    # Workaround as long as setting sbd timeouts is not supported by "crm cluster init"
+    if not return_code and sbd:
+        _manage_sbd(sbd, sbd_dev, sbd_timeout_watchdog, sbd_timeout_msgwait)


I don't know if we should replace this variables by a kwargs item. How many more timeouts or options are available in sbd? If we want to add more options in the future having kwargs would make sense

arbulu89 · 2020-05-21T13:33:51Z

salt/modules/crmshmod.py

@@ -349,45 +357,32 @@ def _ha_cluster_init(
        name = __salt__['network.get_hostname']()
        addr = __salt__['network.interface_ip'](interface or 'eth0')
        _set_corosync_unicast(addr, name)
+    # Workaround as long as setting sbd timeouts is not supported by "ha-cluster-init"
+    if not return_code and sbd:


The same than before. I don't understand in which case this would return an error code

arbulu89 · 2020-05-21T13:35:39Z

salt/modules/crmshmod.py

-    if not sbd_enabled or not sbd_dev or len(sbd_dev) == 1:
-        return sbd_enabled, sbd_dev
+    if not sbd_enabled or not sbd_dev or not sbd_timeout_watchdog or not sbd_timeout_msgwait:
+        return sbd_enabled, sbd_dev, sbd_timeout_watchdog, sbd_timeout_msgwait


I think we don't need to return all of these values. They were used just because sbd_dev was converted to a list if a string was received. Actually, you never use the return values on the calls

arbulu89 · 2020-05-21T13:37:05Z

salt/modules/crmshmod.py


    sbd_str = ' '.join(['-d {}'.format(sbd) for sbd in sbd_dev])
-    cmd = 'sbd {disks} create'.format(disks=sbd_str)
+    cmd = 'sbd {disks} create -1 {sbd_timeout_watchdog} -4 {sbd_timeout_msgwait}'.format(disks=sbd_str, sbd_timeout_watchdog=sbd_timeout_watchdog, sbd_timeout_msgwait=sbd_timeout_msgwait)


Here we are assuming that we are going to receive something always, What if the timeouts are set to None (None are the module defaults in fact).?
We need to generated the string based on that. If the values is different than None, append the new parameters.

yeoldegrove added 2 commits May 14, 2020 17:02

- add parameters for sbd watchdog and msgwait timeouts

7818d9b

- rename _manage_multiple_sbd to _manage_sbd as multiple devices can now be managed by crmsh * remove unneeded workaround * add new workaround for timeouts

add tests for parameters sbd watchdog and msgwait timeouts

0c7cec8

yeoldegrove force-pushed the sbd_timeouts branch from 095c68c to 0c7cec8 Compare May 14, 2020 15:04

yeoldegrove marked this pull request as ready for review May 14, 2020 15:09

arbulu89 self-requested a review May 21, 2020 12:18

arbulu89 requested changes May 21, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add parameters for SBD watchdog and msgwait timeouts #62

add parameters for SBD watchdog and msgwait timeouts #62

yeoldegrove commented May 13, 2020

arbulu89 left a comment

arbulu89 May 21, 2020

arbulu89 May 21, 2020

arbulu89 May 21, 2020

arbulu89 May 21, 2020

arbulu89 May 21, 2020

add parameters for SBD watchdog and msgwait timeouts #62

Are you sure you want to change the base?

add parameters for SBD watchdog and msgwait timeouts #62

Conversation

yeoldegrove commented May 13, 2020

arbulu89 left a comment

Choose a reason for hiding this comment

arbulu89 May 21, 2020

Choose a reason for hiding this comment

arbulu89 May 21, 2020

Choose a reason for hiding this comment

arbulu89 May 21, 2020

Choose a reason for hiding this comment

arbulu89 May 21, 2020

Choose a reason for hiding this comment

arbulu89 May 21, 2020

Choose a reason for hiding this comment