Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add parameters for SBD watchdog and msgwait timeouts #62

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

yeoldegrove
Copy link
Collaborator

I noticed that the default values for watchdog (5) and msgwait (10) timeouts for SBD device creation are used. This is not feasible for production setups.

These commits add parameters for SBD watchdog and msgwait timeouts.
It has to be implemented in kind of a hacky way by recreating the sbd devices after crm cluster init created them initially. This is due to the limitations of crm cluster init.

"_manage_multiple_sbd" was renamed to "_manage_sbd" as multiple devices can now be managed by crmsh.

I tested this in setup with a single and with multiple SBD devices. Also the new parameters are optional and sane defaults will be used.

- rename _manage_multiple_sbd to _manage_sbd as multiple devices can now be managed by crmsh
  * remove unneeded workaround
  * add new workaround for timeouts
@yeoldegrove yeoldegrove marked this pull request as ready for review May 14, 2020 15:09
@arbulu89 arbulu89 self-requested a review May 21, 2020 12:18
Copy link
Collaborator

@arbulu89 arbulu89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yeoldegrove ,
First of all, thank you for this contribution. It is really appreciated, more if this changes are helpful for production setups.
Anyway, I have some topics that I would like to discuss. Everything listed in the comments.
Something important, and I personally know, is that is the sbd command changes the sbd configuration file (/etc/sysconfig/sbd) with this new parameters, or if this doesn't even matter.

PD: @diegoakechi Can we confirm that the usage of multiple sbd disks is available and backported in SLE12/15 versions?

@@ -313,7 +315,11 @@ def _crm_init(
if quiet:
cmd = '{cmd} -q'.format(cmd=cmd)

return __salt__['cmd.retcode'](cmd)
return_code = __salt__['cmd.retcode'](cmd)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite dangerous. I'm wondering why crm cluster init should return an error code if we don't change the call. Do you mean for versions where multiple sbd devices are not supported?

What do you mean with your next comment?
sbd timeouts are not supported in any version

return_code = __salt__['cmd.retcode'](cmd)
# Workaround as long as setting sbd timeouts is not supported by "crm cluster init"
if not return_code and sbd:
_manage_sbd(sbd, sbd_dev, sbd_timeout_watchdog, sbd_timeout_msgwait)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should replace this variables by a kwargs item. How many more timeouts or options are available in sbd? If we want to add more options in the future having kwargs would make sense

@@ -349,45 +357,32 @@ def _ha_cluster_init(
name = __salt__['network.get_hostname']()
addr = __salt__['network.interface_ip'](interface or 'eth0')
_set_corosync_unicast(addr, name)
# Workaround as long as setting sbd timeouts is not supported by "ha-cluster-init"
if not return_code and sbd:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same than before. I don't understand in which case this would return an error code

if not sbd_enabled or not sbd_dev or len(sbd_dev) == 1:
return sbd_enabled, sbd_dev
if not sbd_enabled or not sbd_dev or not sbd_timeout_watchdog or not sbd_timeout_msgwait:
return sbd_enabled, sbd_dev, sbd_timeout_watchdog, sbd_timeout_msgwait
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to return all of these values. They were used just because sbd_dev was converted to a list if a string was received. Actually, you never use the return values on the calls


sbd_str = ' '.join(['-d {}'.format(sbd) for sbd in sbd_dev])
cmd = 'sbd {disks} create'.format(disks=sbd_str)
cmd = 'sbd {disks} create -1 {sbd_timeout_watchdog} -4 {sbd_timeout_msgwait}'.format(disks=sbd_str, sbd_timeout_watchdog=sbd_timeout_watchdog, sbd_timeout_msgwait=sbd_timeout_msgwait)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are assuming that we are going to receive something always, What if the timeouts are set to None (None are the module defaults in fact).?
We need to generated the string based on that. If the values is different than None, append the new parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants