Support kubernetes workload identity #4

testower · 2023-03-24T12:27:52Z

This solves issue #3

Summary

What this PR aims to achieve is to make it possible to use the library in a GKE container that uses workload identity, instead of a credentials json file.

Here is the outline of the solution:

Add a configurable option to use workload identity (detect presence of config)
If enabled, use the following strategy to get a bearer token:
a. Perform an HTTP GET request to http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token with the header "Metadata-Flavor: Google"
b. Retrieve token and expiry from the response : { "access_token": "...", "expires_in":3090,"token_type":"Bearer" }
c. Update the oauth dict

The cleanest way to achieve this is probably to create a new module called "workload_identity_client", along with a new configuration table for it. The client can have the same "get_oauth_token", perhaps renamed to just "get_token", so either client can be passed in to request constructor from producer.

Caveat: Unable to use callbacks to assert success because of race condition

testower · 2023-03-25T22:04:13Z

Everything seems to be working as expected locally - I still need to test it in GKE, I will do that on Monday.

testower · 2023-03-27T10:14:49Z

Today I have tested this live in our dev environment and everything seems to be working fine, except one little caveat. The default token_url does not work because nginx is not able to resolve metadata.google.internal.

I have tried the following solutions:

enable the local resolver (local=on)
add the VM's internal dns to the resolver (169.254.169.254)

Both of these works some, or even most, of the time, but intermittently fail with "metadata.google.internal could not be resolved (3: Host not found)".

However, using the internal name server address instead of metadata.google.internal in the toke_url, seems to always work.

testower · 2023-03-27T11:11:56Z

Another issue I am seeing now is quite a few of these in my logs:

producer.lua:149: _timer_flush(): failed to create timer at _timer_flush, err: too many pending timers

I have not seen them before, so therefore suspecting it's something with my code. Trying to investigate.

testower · 2023-03-27T20:52:33Z

Update: The issue with pending timers were resolved when I reverted to the default settings for the producer config. My hypothesis is that a combination of a short timer interval (1000ms) together with my updates (token retrieval?) resulted in a build up of timers. I'm just wondering, is this the same issue as this on the kafka project: doujiang24/lua-resty-kafka#22 ? Meanwhile, I have to fine-tune the parameters, because the defaults gave me issues with buffer overflow in production.

testower · 2023-03-28T06:11:54Z

Update: Seems like the fix of reverting to default settings only postponed the problem. It now runs for about 20-30 minutes without issues before starting to issue the same warnings (too many pending timers).

testower · 2023-03-28T20:11:00Z

Update: Issue resolved, it was a silly mistake during testing where I used dofile instead of require on the producer, which resultet in an increasing number of producer instances with runaway timers.

This PR is now ready for review 🎉

testower · 2023-12-20T21:02:09Z

@Vasu7052 Is this something you are interested in as a feature?

testower marked this pull request as draft March 24, 2023 12:27

ankneo requested a review from Vasu7052 March 24, 2023 12:38

ankneo assigned ankneo and Vasu7052 Mar 24, 2023

testower force-pushed the oauth-token-from-path branch 4 times, most recently from 1be21da to 2eae9c2 Compare March 25, 2023 12:10

Support getting token from workload identity via metadata server

21baa11

testower force-pushed the oauth-token-from-path branch from 2eae9c2 to 21baa11 Compare March 25, 2023 12:12

testower added 4 commits March 25, 2023 21:21

Add support for workload_identity_client to producer

bf36906

Fix indentation

e7944b8

Tweaks

5a3a04c

Integration test

3f28aab

Caveat: Unable to use callbacks to assert success because of race condition

testower marked this pull request as ready for review March 25, 2023 22:03

testower added 3 commits March 25, 2023 23:05

Token url is not required, default can be used

b44b6a8

Fix typo

be6f715

Add nil check

7a2efaf

testower force-pushed the oauth-token-from-path branch from 51aeb54 to 7a2efaf Compare March 27, 2023 09:46

testower force-pushed the oauth-token-from-path branch from fee68c0 to 7a2efaf Compare March 27, 2023 10:15

Fix pointer to token_expires

fccf63a

Update rockspec

a58be86

testower changed the title ~~Support getting token from path~~ Support kubernetes workload identity Mar 29, 2023

Update default token url

8d4c1e0

testower closed this Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support kubernetes workload identity #4

Support kubernetes workload identity #4

testower commented Mar 24, 2023 •

edited

Loading

testower commented Mar 25, 2023

testower commented Mar 27, 2023

testower commented Mar 27, 2023

testower commented Mar 27, 2023

testower commented Mar 28, 2023

testower commented Mar 28, 2023

testower commented Dec 20, 2023

Support kubernetes workload identity #4

Support kubernetes workload identity #4

Conversation

testower commented Mar 24, 2023 • edited Loading

Summary

testower commented Mar 25, 2023

testower commented Mar 27, 2023

testower commented Mar 27, 2023

testower commented Mar 27, 2023

testower commented Mar 28, 2023

testower commented Mar 28, 2023

testower commented Dec 20, 2023

testower commented Mar 24, 2023 •

edited

Loading