-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support kubernetes workload identity #4
Conversation
1be21da
to
2eae9c2
Compare
2eae9c2
to
21baa11
Compare
Caveat: Unable to use callbacks to assert success because of race condition
Everything seems to be working as expected locally - I still need to test it in GKE, I will do that on Monday. |
51aeb54
to
7a2efaf
Compare
Today I have tested this live in our dev environment and everything seems to be working fine, except one little caveat. The default token_url does not work because nginx is not able to resolve I have tried the following solutions:
Both of these works some, or even most, of the time, but intermittently fail with However, using the internal name server address instead of metadata.google.internal in the toke_url, seems to always work. |
fee68c0
to
7a2efaf
Compare
Another issue I am seeing now is quite a few of these in my logs:
I have not seen them before, so therefore suspecting it's something with my code. Trying to investigate. |
Update: The issue with pending timers were resolved when I reverted to the default settings for the producer config. My hypothesis is that a combination of a short timer interval (1000ms) together with my updates (token retrieval?) resulted in a build up of timers. I'm just wondering, is this the same issue as this on the kafka project: doujiang24/lua-resty-kafka#22 ? Meanwhile, I have to fine-tune the parameters, because the defaults gave me issues with buffer overflow in production. |
Update: Seems like the fix of reverting to default settings only postponed the problem. It now runs for about 20-30 minutes without issues before starting to issue the same warnings (too many pending timers). |
Update: Issue resolved, it was a silly mistake during testing where I used dofile instead of require on the producer, which resultet in an increasing number of producer instances with runaway timers. This PR is now ready for review 🎉 |
@Vasu7052 Is this something you are interested in as a feature? |
This solves issue #3
Summary
What this PR aims to achieve is to make it possible to use the library in a GKE container that uses workload identity, instead of a credentials json file.
Here is the outline of the solution:
a. Perform an HTTP GET request to
http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token
with the header "Metadata-Flavor: Google"b. Retrieve token and expiry from the response :
{ "access_token": "...", "expires_in":3090,"token_type":"Bearer" }
c. Update the oauth dict
The cleanest way to achieve this is probably to create a new module called "workload_identity_client", along with a new configuration table for it. The client can have the same "get_oauth_token", perhaps renamed to just "get_token", so either client can be passed in to request constructor from producer.