Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand FAILED_PRECONDITION error in NodeExpandVolume RPC call #431

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -2523,12 +2523,16 @@ If plugin has `STAGE_UNSTAGE_VOLUME` node capability then:

Otherwise `NodeExpandVolume` MUST be called after successful `NodePublishVolume`.

A plugin that has `STAGE_UNSTAGE_VOLUME` node capability and supports `NodeExpandVolume` ONLY after `NodeStageVolume` and before `NodePublishVolume` but not after `NodePublishVolume` may return `FAILED_PRECONDITION` error code if `NodeExpandVolume` is called after `NodePublishVolume` - CO MUST NOT retry if volume is node-published.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to say CO MUST NOT retry if volume is node-published? Because that means offline is not supported?

Suggested change
A plugin that has `STAGE_UNSTAGE_VOLUME` node capability and supports `NodeExpandVolume` ONLY after `NodeStageVolume` and before `NodePublishVolume` but not after `NodePublishVolume` may return `FAILED_PRECONDITION` error code if `NodeExpandVolume` is called after `NodePublishVolume` - CO MUST NOT retry if volume is node-published.
An SP that has `STAGE_UNSTAGE_VOLUME` node capability and supports `NodeExpandVolume` ONLY after `NodeStageVolume` and before `NodePublishVolume` but not after `NodePublishVolume` may return `FAILED_PRECONDITION` error code if `NodeExpandVolume` is called after `NodePublishVolume` and, in response, the CO SHOULD ensure that the volume is is node-staged but not node-published before retrying with exponential backoff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did mean "CO MUST NOT retry if volume is node-published". The reason being - if a SP does not allow expansion after node-publish and volume already has been node-published, the only way NodeExpandVolume can succeed after retry is - if they NodeUnpublish the volume and the only way CO can retry expansion is on another node after node-stage but before node-publish. For CO it is already too late to retry expansion on this node because NodeUnpublish is very likely to be a unsafe operation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that means offline is not supported?

I think it does not mean that - it means that CO must try expansion on another node after node-stage and before node-publish. It just means it can't happen on this node. Having said that - there is a slight problem with language:

If plugin has STAGE_UNSTAGE_VOLUME node capability then:

  • NodeExpandVolume MUST be called after successful NodeStageVolume.
  • NodeExpandVolume MAY be called before or after NodePublishVolume.

I know Kubernetes always calls NodeExpandVolume after NodeStageVolume if NodeExpand is supported but in general a different CO may choose to interpret this differently and may skip calling NodeExpand between node-stage and node-publish and only call NodeExpand after node-publish in which case "offline" expansion may not work on that particular CO. Even then I think recovery from volume in-use error on node shouldn't be "retry with exponential backoff" because it is highly problematic to node-unpublish a volume or if volume is readonly on a node, make it RW.

I think we have two options:

  1. Make no spec change for this particular behaviour in NodeExpandVolume. The downside might be - another CO may interpret this spec differently and implement call to NodeExpandVolume differently and plugins might be dependent on k8s behaviour (although I don't know any plugins which want this behaviour).
  2. Make the language change around when NdoeExpandVolume can be called. so something like - "it is desirable to call NodeExpandVolume before NodePublish to permit expansion of volumes which only permit expansion after node-stage and before node-publish and not after node-publish"


If a plugin only supports expansion via the `VolumeExpansion.OFFLINE` capability, then the volume MUST first be taken offline and expanded via `ControllerExpandVolume` (see `ControllerExpandVolume` for more details), and then node-staged or node-published before it can be expanded on the node via `NodeExpandVolume`.

The `staging_target_path` field is not required, for backwards compatibility, but the CO SHOULD supply it.
Plugins can use this field to determine if `volume_path` is where the volume is published or staged,
and setting this field to non-empty allows plugins to function with less stored state on the node.

If a plugin does not support expansion of `readonly` volumes it may return `FAILED_PRECONDITION` error code and CO MUST NOT retry.

```protobuf
message NodeExpandVolumeRequest {
// The ID of the volume. This field is REQUIRED.
Expand Down Expand Up @@ -2576,7 +2580,7 @@ message NodeExpandVolumeResponse {
|-----------------------|-----------|-----------------------|-----------------------------------|
| Exceeds capabilities | 3 INVALID_ARGUMENT | Indicates that CO has specified capabilities not supported by the volume. | Caller MAY verify volume capabilities by calling ValidateVolumeCapabilities and retry with matching capabilities. |
| Volume does not exist | 5 NOT FOUND | Indicates that a volume corresponding to the specified volume_id does not exist. | Caller MUST verify that the volume_id is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. |
| Volume in use | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged and the underlying filesystem does not support expansion of published or staged volumes. | Caller MUST NOT retry. |
| Volume in use or readonly | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged or readonly and the storage provider does not support expansion of published or staged or readonly volumes. | Caller MUST NOT retry. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Volume in use or readonly | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged or readonly and the storage provider does not support expansion of published or staged or readonly volumes. | Caller MUST NOT retry. |
| Volume in use or readonly | 9 FAILED_PRECONDITION | Indicates that the volume corresponding to the specified `volume_id` could not be expanded because it is node-published or node-staged or readonly and the storage provider does not support expansion of published or staged or readonly volumes. | Caller SHOULD ensure that volume is not node-published before retrying with exponential back off. |

| Unsupported capacity_range | 11 OUT_OF_RANGE | Indicates that the capacity range is not allowed by the Plugin. More human-readable information MAY be provided in the gRPC `status.message` field. | Caller MUST fix the capacity range before retrying. |

## Protocol
Expand Down