log error instead of aborting for restore failures #11

sskaur · 2022-03-16T15:39:06Z

In the case of gateway groups, all gateways within the group should have identical NVMf target configurations. Upon starting, the management daemon on a gateway within the group should check for ability to restore from an established NVMf target configuration to match the group. Currently, the restore is aborted and the daemon exits if there is an issue restoring some component of the target. This ensures the gateway isn't available while having a mismatched target to the others within the group.

As @trociny points out in #9, in a case where an image in ceph is accidentally deleted, a gateway attempting to restore the bdev will always fail. This could lead to all gateways within the group dying.

Should the gateway continuously check for availability of images to maintain the correctness of the OMAP specification?

As @trociny suggests, is it better to output an error message instead of aborting on restore failure? In this case, should the gateway keep track of errors to avoid attempting to restore components that have dependencies on ones that have already failed? Ex: Restoring subsystem-A fails, the gateway logs an error, continues restoring other subsystems and components but skips attempting to restore any namespaces, hosts, listeners associated with subsystem-A.

sskaur mentioned this issue Mar 16, 2022

Add target save and restore functionality #9

Merged

epuertat added this to NVMe-oF May 21, 2023

github-project-automation bot moved this to 🆕 New in NVMe-oF May 21, 2023

leonidc mentioned this issue Aug 7, 2023

GW server logs notice on keep alive timeout and disconnects host from subsystem #161

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log error instead of aborting for restore failures #11

log error instead of aborting for restore failures #11

sskaur commented Mar 16, 2022 •

edited

Loading

log error instead of aborting for restore failures #11

log error instead of aborting for restore failures #11

Comments

sskaur commented Mar 16, 2022 • edited Loading

sskaur commented Mar 16, 2022 •

edited

Loading