Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PWX-38596 : After disabling PX Security STC goes into DEGRADED state due to rolling update failed #1648

Merged
merged 2 commits into from
Aug 19, 2024

Conversation

nikita-bhatia
Copy link
Contributor

@nikita-bhatia nikita-bhatia commented Aug 16, 2024

What this PR does / why we need it:
After disabling PX Security STC goes into DEGRADED state due to rolling update failed: cannot upgrade nodes in parallel: failed to get non overlapping nodes: rpc error: code = PermissionDenied desc = Access denied without authentication token

Reason : When security was disabled from previously enabled value, the update on storagenodes is still not done at the point when API call is made to PX to get storage nodes, so even thought security is disabled from STC, storagenode is still not in sync and expects token to be passed by the operator during GRPC call.

Note that `skipSecurityCheck' variable is removed which was previously added in #1539 for improvement of logic as discussed in PR.

Fix
The code change does following to decide if token is required or not to for API calls.

  • fetches the secret
  • if secret is not found and security is disabled on STC, proceeds without token.
  • if secret is not found and security is enabled on STC, returns error because secret is expected in this case

Which issue(s) this PR fixes (optional)
Closes #
https://purestorage.atlassian.net/browse/PWX-38596
https://purestorage.atlassian.net/browse/PWX-37232

Special notes for your reviewer:

  • verified on vanilla k8s cluster, by updating the operator image with this fix, the STC comes out of Degraded state and all nodes were upgraded

Copy link

codecov bot commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 87.50000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 79.79%. Comparing base (35b6e99) to head (a8b4dee).
Report is 1 commits behind head on release-24.2.0.

Files Patch % Lines
drivers/storage/portworx/util/util.go 75.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@                Coverage Diff                 @@
##           release-24.2.0    #1648      +/-   ##
==================================================
- Coverage           79.79%   79.79%   -0.01%     
==================================================
  Files                  61       61              
  Lines               18834    18831       -3     
==================================================
- Hits                15029    15026       -3     
  Misses               2715     2715              
  Partials             1090     1090              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if errors.IsNotFound(err) {
if SecurityEnabled(cluster) {
// when security is enabled in STC, and secret is not found, return error
return nil, fmt.Errorf("failed to get portworx apps secret: %v", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: include namespace/name of the secret in the error message
Also, use ("...: %w", err)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering why is this secret called apps secret? Shouldn't it just be a security token secret or security context secret?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are specifically trying to fetch apps-secret key here inside px-system-secrets secret, so the error message is specifying that.

@pureneelesh
Copy link
Contributor

If they disable security AND delete the secret, we will still run into an issue?

@nikita-bhatia
Copy link
Contributor Author

If they disable security AND delete the secret, we will still run into an issue?

Ideally No, because

  • since when security is disabled, automatically secret is not deleted and if it is deleted manually then the workaround would be to delete enable the security and disable again since it is not a valid scenario.

I tried deleting the secret manually but all the nodes update went fine. This might depend upon the condition what is happening first.

So to conclude, in a valid cluster state, this would not create problem.

@nikita-bhatia nikita-bhatia merged commit 30ed9ab into release-24.2.0 Aug 19, 2024
10 of 11 checks passed
@nikita-bhatia nikita-bhatia deleted the PWX-38596 branch August 19, 2024 08:48
mdaigavane-px pushed a commit that referenced this pull request Aug 21, 2024
…due to rolling update failed (#1648)

* PWX-38596 : Skip adding token in context if security is disabled in stc

* update failing tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants