Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWX-37595: honour force flag over node status #2479

Merged
merged 4 commits into from
Sep 19, 2024

Conversation

sanjain-px
Copy link
Collaborator

@sanjain-px sanjain-px commented Aug 30, 2024

What this PR does / why we need it:
This PR allows deleting nodes irrespective of their status when force flag is passed.
This allows deletion of faulty node which are online but can't be used.

Which issue(s) this PR fixes (optional)
PWX-37959

Testing Notes

  1. Delete faulty node
pxctl status
Status: PX is operational
Telemetry: Disabled or Unhealthy
Metering: Disabled or Unhealthy
License: Trial (expires in 30 days)
Node ID: f88492c5-1f7b-4868-bb28-c6495d1e2f87
        IP: <ip>
        Local Storage Pool: 1 pool
        POOL    IO_PRIORITY     RAID_LEVEL      USABLE  USED    STATUS  ZONE    REGION
        0       HIGH            raid0           381 GiB 12 GiB  Online  default default
        Local Storage Devices: 3 devices
        Device  Path            Media Type              Size            Last-Scan
        0:1     /dev/sdc2       STORAGE_MEDIUM_SSD      125 GiB         29 Aug 24 07:16 CDT
        0:2     /dev/sdd        STORAGE_MEDIUM_SSD      128 GiB         29 Aug 24 07:16 CDT
        0:3     /dev/sde        STORAGE_MEDIUM_SSD      128 GiB         29 Aug 24 07:16 CDT
        total                   -                       381 GiB
        Cache Devices:
         * No cache devices
        Kvdb Device:
        Device Path     Size
        /dev/sdb        64 GiB
         * Internal kvdb on this node is using this dedicated kvdb device to store its data.
        Journal Device:
        1       /dev/sdc1       STORAGE_MEDIUM_SSD      3.0 GiB
Cluster Summary
        Cluster ID: instant-tb-91137
        Cluster UUID: a0dd6862-7c42-477b-aeb3-244093383483
        Scheduler: kubernetes
        Total Nodes: 3 node(s) with storage (3 online)
        IP              ID                                      SchedulerNodeName          Auth             StorageNode     Used            Capacity        Status  StorageStatus   Version             Kernel                          OS
        Unavailable     fb1a5758-e374-4024-8e74-22cd668b0058    N/A                        Unknown          Yes             Unavailable     Unavailable     Online  Up              Unavailable Unavailable                     Unavailable
        <ip>   f88492c5-1f7b-4868-bb28-c6495d1e2f87    ip-.pwx.purestorage.com        Disabled        Yes             12 GiB          381 GiB         Online  Up (This node)      3.2.0.0-db23d8e 5.14.0-284.23.1.el9_2.x86_64    Red Hat Enterprise Linux 9.2 (Plow)
        Unavailable     39689fa6-6075-406c-9e39-88ca27739f05    N/A                        Unknown          Yes             Unavailable     Unavailable     Online  Up              Unavailable Unavailable                     Unavailable
Global Storage Pool
        Total Used      :  12 GiB
        Total Capacity  :  381 GiB
Warning: Error occurred during capacity calculation. The overall capacity and usage numbers may not be accurate.
  1. PX asked for an approval
pxctl cluster delete fb1a5758-e374-4024-8e74-22cd668b0058 -f
This is a disruptive operation.
Volumes which reside only on this node or which have latest data only on this node will be irrevocably deleted or reset to a previous state.
Are you sure you want to proceed ? (Y/N): Y
Node fb1a5758-e374-4024-8e74-22cd668b0058 successfully deleted.
  1. Node removed
pxctl status
Status: PX is operational
Telemetry: Disabled or Unhealthy
Metering: Disabled or Unhealthy
License: Trial (expires in 30 days)
Node ID: f88492c5-1f7b-4868-bb28-c6495d1e2f87
        IP: <ip>
        Local Storage Pool: 1 pool
        POOL    IO_PRIORITY     RAID_LEVEL      USABLE  USED    STATUS  ZONE    REGION
        0       HIGH            raid0           381 GiB 12 GiB  Online  default default
        Local Storage Devices: 3 devices
        Device  Path            Media Type              Size            Last-Scan
        0:1     /dev/sdc2       STORAGE_MEDIUM_SSD      125 GiB         29 Aug 24 07:16 CDT
        0:2     /dev/sdd        STORAGE_MEDIUM_SSD      128 GiB         29 Aug 24 07:16 CDT
        0:3     /dev/sde        STORAGE_MEDIUM_SSD      128 GiB         29 Aug 24 07:16 CDT
        total                   -                       381 GiB
        Cache Devices:
         * No cache devices
        Kvdb Device:
        Device Path     Size
        /dev/sdb        64 GiB
         * Internal kvdb on this node is using this dedicated kvdb device to store its data.
        Journal Device:
        1       /dev/sdc1       STORAGE_MEDIUM_SSD      3.0 GiB
Cluster Summary
        Cluster ID: instant-tb-91137
        Cluster UUID: a0dd6862-7c42-477b-aeb3-244093383483
        Scheduler: kubernetes
        Total Nodes: 2 node(s) with storage (2 online)
        IP              ID                                      SchedulerNodeName          Auth             StorageNode     Used            Capacity        Status  StorageStatus   Version             Kernel                          OS
        <ip>   f88492c5-1f7b-4868-bb28-c6495d1e2f87    ip-.pwx.purestorage.com        Disabled        Yes             12 GiB          381 GiB         Online  Up (This node)      3.2.0.0-db23d8e 5.14.0-284.23.1.el9_2.x86_64    Red Hat Enterprise Linux 9.2 (Plow)
        Unavailable     39689fa6-6075-406c-9e39-88ca27739f05    N/A                        Unknown          Yes             Unavailable     Unavailable     Online  Up              Unavailable Unavailable                     Unavailable
Global Storage Pool
        Total Used      :  12 GiB
        Total Capacity  :  381 GiB
Warning: Error occurred during capacity calculation. The overall capacity and usage numbers may not be accurate.
  1. Observed PX status for ~15 min. PX was healthy for the entire duration
    Special notes for your reviewer:
    Add any notes for the reviewer here.

pr-verify output:

 ~/go/src/github.com/libopenstorage/openstorage   master ±  git stash
Saved working directory and index state WIP on master: a2f15f65 added cluster listner mock
 ⚡ root@ip-10-13-188-35  ~/go/src/github.com/libopenstorage/openstorage   master  make pr-verify
go fmt  | grep -v "api.pb.go" | wc -l | grep "^0";
package github.com/libopenstorage/openstorage: no Go files in /root/go/src/github.com/libopenstorage/openstorage
0
go run tools/sdkver/sdkver.go --check-major=0 --check-minor=101
0.101.54
git-validation -run DCO,short-subject
INFO[0000] using commit range: bc42ffa7..a2f15f65
 * a2f15f65 "added cluster listner mock" ... PASS
 * 69efee0e "minor changes" ... PASS
 * 76e95206 "minor changes" ... PASS
 * 534111b6 "PWX-37959: honour force flag over node status" ... PASS
make docker-proto
make[1]: Entering directory '/root/go/src/github.com/libopenstorage/openstorage'
docker pull quay.io/openstorage/osd-proto:pre-gomodules
pre-gomodules: Pulling from openstorage/osd-proto
Digest: sha256:76b09a1a0231f9cf78f22eecd243d7ddb2bcc464e0cea9493a2d08dfe534ef14
Status: Image is up to date for quay.io/openstorage/osd-proto:pre-gomodules
quay.io/openstorage/osd-proto:pre-gomodules
docker run \
        --privileged --rm \
        -v /root/go/src/github.com/libopenstorage/openstorage:/go/src/github.com/libopenstorage/openstorage \
        -e "GOPATH=/go" \
        -e "DOCKER_PROTO=yes" \
        -e "PATH=/bin:/usr/bin:/usr/local/bin:/go/bin:/usr/local/go/bin" \
        quay.io/openstorage/osd-proto:pre-gomodules \
                make proto mockgen
>>> Generating protobuf definitions from api/api.proto
protoc -I /go/src/github.com/libopenstorage/openstorage \
        -I /usr/local/include \
        -I /go/src/github.com/grpc-ecosystem/grpc-gateway/third_party/googleapis \
        --go_out=plugins=grpc:. \
        /go/src/github.com/libopenstorage/openstorage/api/api.proto
protoc -I /go/src/github.com/libopenstorage/openstorage \
        -I /usr/local/include \
        -I /go/src/github.com/grpc-ecosystem/grpc-gateway/third_party/googleapis \
        --grpc-gateway_out=logtostderr=true:. \
        /go/src/github.com/libopenstorage/openstorage/api/api.proto
protoc -I /go/src/github.com/libopenstorage/openstorage \
        -I /usr/local/include \
        -I /go/src/github.com/grpc-ecosystem/grpc-gateway/third_party/googleapis \
        --swagger_out=logtostderr=true:/go/src/github.com/libopenstorage/openstorage/api/server/sdk \
        /go/src/github.com/libopenstorage/openstorage/api/api.proto
>>> Upgrading swagger 2.0 to openapi 3.0
mv api/server/sdk/api/api.swagger.json api/server/sdk/api/20api.swagger.json
swagger2openapi api/server/sdk/api/20api.swagger.json -o api/server/sdk/api/api.swagger.json
rm -f api/server/sdk/api/20api.swagger.json
>>> Generating grpc protobuf definitions from pkg/flexvolume/flexvolume.proto
protoc -I/usr/local/include -I/go/src/github.com/libopenstorage/openstorage -I/go/src/github.com/grpc-ecosystem/grpc-gateway/third_party/googleapis --go_out=plugins=grpc:. /go/src/github.com/libopenstorage/openstorage/pkg/flexvolume/flexvolume.proto
protoc -I/usr/local/include -I/go/src/github.com/libopenstorage/openstorage -I/go/src/github.com/grpc-ecosystem/grpc-gateway/third_party/googleapis --grpc-gateway_out=logtostderr=true:. /go/src/github.com/libopenstorage/openstorage/pkg/flexvolume/flexvolume.proto
>>> Generating protobuf definitions from pkg/jsonpb/testing/testing.proto
protoc -I /go/src/github.com/libopenstorage/openstorage /go/src/github.com/libopenstorage/openstorage/pkg/jsonpb/testing/testing.proto --go_out=plugins=grpc:.
>>> Updating SDK versions
go run tools/sdkver/sdkver.go --swagger api/server/sdk/api/api.swagger.json
0.101.54
go get github.com/golang/mock/gomock
go get github.com/golang/mock/mockgen || echo "ignoring go get build error"
package io/fs: unrecognized import path "io/fs" (import path does not begin with hostname)
ignoring go get build error
cd /go/src/github.com/golang/mock/mockgen && git checkout v1.2.0 && go install github.com/golang/mock/mockgen
Note: checking out 'v1.2.0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 51421b9 mockgen: use Controller.Helper() in generated mocks
mockgen -destination=api/mock/mock_storagepool.go -package=mock github.com/libopenstorage/openstorage/api OpenStoragePoolServer,OpenStoragePoolClient
mockgen -destination=api/mock/mock_cluster.go -package=mock github.com/libopenstorage/openstorage/api OpenStorageClusterServer,OpenStorageClusterClient
mockgen -destination=api/mock/mock_node.go -package=mock github.com/libopenstorage/openstorage/api OpenStorageNodeServer,OpenStorageNodeClient
mockgen -destination=api/mock/mock_diags.go -package=mock github.com/libopenstorage/openstorage/api OpenStorageDiagsServer,OpenStorageDiagsClient
mockgen -destination=api/mock/mock_volume.go -package=mock github.com/libopenstorage/openstorage/api OpenStorageVolumeServer,OpenStorageVolumeClient
mockgen -destination=api/mock/mock_watch.go -package=mock github.com/libopenstorage/openstorage/api OpenStorageWatchServer,OpenStorageWatchClient,OpenStorageWatch_WatchClient,OpenStorageWatch_WatchServer
mockgen -destination=api/mock/mock_bucket.go -package=mock github.com/libopenstorage/openstorage/api OpenStorageBucketServer,OpenStorageBucketClient
mockgen -destination=cluster/mock/cluster.mock.go -package=mock github.com/libopenstorage/openstorage/cluster Cluster
mockgen -destination=cluster/mock/cluster_listener.mock.go -package=mock github.com/libopenstorage/openstorage/cluster ClusterListener
mockgen -destination=api/mock/mock_fstrim.go -package=mock github.com/libopenstorage/openstorage/api OpenStorageFilesystemTrimServer,OpenStorageFilesystemTrimClient
mockgen -destination=api/mock/mock_fscheck.go -package=mock github.com/libopenstorage/openstorage/api OpenStorageFilesystemCheckServer,OpenStorageFilesystemCheckClient
mockgen -destination=api/mock/mock_defrag.go -package=mock github.com/libopenstorage/openstorage/api OpenStorageFilesystemDefragServer,OpenStorageFilesystemDefragClient
mockgen -destination=api/server/mock/mock_schedops_k8s.go -package=mock github.com/portworx/sched-ops/k8s/core Ops
mockgen -destination=volume/drivers/mock/driver.mock.go -package=mock github.com/libopenstorage/openstorage/volume VolumeDriver
mockgen -destination=bucket/drivers/mock/bucket_driver.mock.go -package=mock github.com/libopenstorage/openstorage/bucket BucketDriver
make[1]: Leaving directory '/root/go/src/github.com/libopenstorage/openstorage'
git diff  | wc -l | grep "^0"
0
hack/check-api-version.sh
fatal: ambiguous argument 'release-9.0..HEAD': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
git grep -rw GPL vendor | grep LICENSE | egrep -v "yaml.v2" | wc -l | grep "^0"
0
hack/check-registered-rest.sh
+ cat api/api.pb.go
+ grep func Register.*
+ wc -l
+ registerFuncs=25
+ cat api/server/sdk/rest_gateway.go
+ grep Register.*Handler,
+ wc -l
+ registered=25
+ [ 25 != 25 ]
+ echo All REST handlers are registered in api/server/sdk/rest_gateway.go.
All REST handlers are registered in api/server/sdk/rest_gateway.go.

make osd-tests docker-test > test_op.txt.
test_op.txt

Copy link

github-actions bot commented Sep 3, 2024

This PR is stale because it has been in review for 3 days with no activity.

@zoxpx
Copy link
Contributor

zoxpx commented Sep 3, 2024

How did you get the "faulty node" into the system, for your testing?

Also, could you estimate how long it would take to introduce a unit-test for this PR? If its relatively small, I think you should add the UT into this PR -- otherwise if it'll take 1/2 week or longer, we should probably ask @harsh-px and Srikanth for exception.

Copy link

github-actions bot commented Sep 8, 2024

This PR is stale because it has been in review for 3 days with no activity.

@sanjain-px sanjain-px changed the base branch from master to release-9.9 September 9, 2024 12:24
@sanjain-px sanjain-px changed the base branch from release-9.9 to ln/release-9.9 September 9, 2024 12:26
@sanjain-px sanjain-px force-pushed the master branch 2 times, most recently from 2d20853 to 534111b Compare September 9, 2024 12:43
@sanjain-px sanjain-px changed the base branch from ln/release-9.9 to release-9.9 September 9, 2024 12:44
@sanjain-px
Copy link
Collaborator Author

sanjain-px commented Sep 9, 2024

@zoxpx

How did you get the "faulty node" into the system, for your testing?

There was an ongoing issue due to which when we upgraded px from 3.1.x to 3.2.0, the upgraded node became unavailable but status was online
so, I triggered the upgrade, as soon as 1st node was upgraded, scaled down operator to 0 which stopped further upgrade

Also, could you estimate how long it would take to introduce a unit-test for this PR

Added UT for the PR

Signed-off-by: sanjain <[email protected]>
Copy link

This PR is stale because it has been in review for 3 days with no activity.

Signed-off-by: sanjain <[email protected]>
@pnookala-px pnookala-px merged commit 24e88ce into libopenstorage:release-9.9 Sep 19, 2024
1 check failed
sanjain-px added a commit that referenced this pull request Sep 19, 2024
* PWX-37959: honour force flag over node status

Signed-off-by: sanjain <[email protected]>

* minor changes

Signed-off-by: sanjain <[email protected]>

* minor changes

Signed-off-by: sanjain <[email protected]>

* added cluster listner mock

Signed-off-by: sanjain <[email protected]>

---------

Signed-off-by: sanjain <[email protected]>
sanjain-px added a commit that referenced this pull request Sep 25, 2024
sanjain-px added a commit that referenced this pull request Sep 25, 2024
sanjain-px added a commit that referenced this pull request Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants