-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition between service account availability and webhook invocation #236
Conversation
91754ec
to
79fcca1
Compare
79fcca1
to
ea82b2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor questions/suggestions, but the overall approach looks good. I'm happy to work with you on getting this merged 😁
Thank you for the contribution!
Hi @modulitos. This is good news. I appreciate seeing this eventually to be merged. In my eyes, every single comment of yours seems to be very valid and I'm going to resolve them step by step. I'm also going to add a test which tests exactly the fixed behavior. Currently I actually don't know how long it will take but I keep you posted. |
e3fe4ce
to
dccceff
Compare
@modulitos, I have also added a test which tests the asynchronous cache notification behavior. But now waiting for your thoughts about the open topics from above to not waste time working in the wrong direction. |
e20c44f
to
18f345b
Compare
Just adding a note here, that an alternative solution for handling cache misses is to fetch and populate the service account from the kubernetes api server. Here's a quick list of tradeoffs from these two approaches:
I'm in favor of proceeding with the lookup grace period in this PR, but open to feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thank you for the contribution. Fixes a bug impacting multiple users, without adding overhead to the API server 🎉
In #174 a race condition is mentioned, where the webhook tries to mutate a Pod having a service account reference which is not yet populated to the ServiceAccountCache. This results in the pod being not mutated.
In our scenario this is a matter of a few milliseconds before the cache is properly populated.
As an alternative to #178, this PR tries to fix this by waiting for a certain amount of time for proper ServiceAccountCache population and being notified upon population. Additionally, to wait only for SAs which aren't in the cache at all and not just missing a
RoleARN
, I had to restructure the return values of theServiceAccountCache
methods from multiple return values to structured data in order to transport the required information without adding additional two return values.Jan Roehrich [email protected], Mercedes-Benz Tech Innovation GmbH, legal info/Impressum