Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove call credentials from call options if DirectPath #3670

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

rockspore
Copy link
Member

@rockspore rockspore commented Mar 1, 2025

This PR eliminates the issue where call credentials get attached twice to a RPC in DirectPath cases. Particularly, when user credentials get used, the problem causes the RPC to fail due to the duplication of the quota project ID (internal-only context: b/364288002).

The approach is to strip the credentials in the callOptions returned by the GrpcCallContext if the TransportChannel is DirectPath. The side effect is that users won't be able to configure call credentials via the ApiCallContext if DirectPath is used.

We think this is acceptable because:

  1. Users can still configure the credentials via TransportChannelProvider.withCredentials(). At a higher level, this is done by configuring the CredentialsProvider in the StubSettings or the ServiceOptions.
  2. As of now, DirectPath has its own special authentication flow, in which the service account attached to the GCE VM or GKE Pod will be used. Although in some special cases, the call credentials will be used to authenticate the client's identity, the peculiar nature of DirectPath should justify us limiting the flexibility of how call credentials can be configured in this case.

Tested DirectPath using Spanner

Headers sent

[:authority: spanner.googleapis.com, :path: /google.spanner.v1.Spanner/BatchCreateSessions, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: spanner-java/6.86.0 grpc-java-netty/1.69.0, ..., grpc-accept-encoding: gzip, authorization: Bearer ya29.****, ..., authorization: Bearer 1234, grpc-timeout: 56962080u]

Bearer token is sent twice (first ya29.*** value is valid and second 1234 is invalid). The second one was attached by customizing the ApiCallContext to send an invalid CallCredentials as part of the CallOptions. The call still succeeded as the first Bearer token in the Metadata is used.

@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Mar 1, 2025
@rockspore
Copy link
Member Author

@lqiu96 I've verified this would fix the DirectPath case. No idea about the obscure failure in java-firestore yet. Will add tests next week. I think this approach is promising.

@lqiu96
Copy link
Contributor

lqiu96 commented Mar 3, 2025

I think this does work for DirectPath.

However, doesn't this issue also exist for S2A where both tokens are being passed? Also, this would override any user set CallOption value.

@rockspore
Copy link
Member Author

However, doesn't this issue also exist for S2A where both tokens are being passed?

I was wondering if we could introduce a similar API as isDirectPath() for TransportChannel for S2A w/ bound tokens so that the same approach could be used there.

Also, this would override any user set CallOption value.

This is by far the simplest approach and doesn't even need to touch how channels are constructed in any other non-DirectPath case (S2A with bound token will also need to be handled later). And somewhat biased by this, I'm hoping we could be a bit opinionated in the DirectPath case to disregard users' call credentials in the call options, given we clearly document this behavior. Note in my other PR, we are overriding the call option value as well.

@lqiu96
Copy link
Contributor

lqiu96 commented Mar 3, 2025

I was wondering if we could introduce a similar API as isDirectPath() for TransportChannel for S2A w/ bound tokens so that the same approach could be used there.

I would like this. I'm not able to easily map a way to determine this since I think it would require actually trying to create the CallCredentials and checking if they could be created. I guess that is a possibility we can also try. I was exploring dynamically setting the call options creds in #3671

I'm hoping we could be a bit opinionated in the DirectPath case to disregard users' call credentials in the call options, given we clearly document this behavior.

I would like to be opinionated on this too. However, can you remind me the behavior if something like SACreds were used instead of UserCreds, would DP would work? Would it be possible to override the calloptions creds using DP?

S2A with bound token will also need to be handled later

I'm not against this, though I would like to try and see if we can figure a generic solution to tackle sending creds twice, rather than adding one-off fixes for specific flows.

@rockspore
Copy link
Member Author

I would like this. I'm not able to easily map a way to determine this since I think it would require actually trying to create the CallCredentials and checking if they could be created. I guess that is a possibility we can also try. I was exploring dynamically setting the call options creds in #3671

S2A don't generally take the CallCredentials except for the bound token case introduced in #3591, where CompositeChannelCredentials.create is used to pass the special CallCredentials. So we can call the setter of the TransportChannel there. I could be missing something, but I don't think #3671 works because ChannelCredentials are not required to always carry the CallCredentials, e.g. TlsChannelCredentials.create() would give you an instance without any CallCredentials. We only care about whether CallCredentials have been attached to the channel here.

I would like to be opinionated on this too. However, can you remind me the behavior if something like SACreds were used instead of UserCreds, would DP would work? Would it be possible to override the calloptions creds using DP?

Any creds (SA, Users), as long as allowed right now by DirectPath, should work without any problem with this PR. It's just those creds need to be passed into the TransportChannelProvider to be used in the DirectPath channel creation. We don't enable bound tokens if the given creds are not ComputeEngine ones, but we still pass them as is into GoogleDefaultChannelCredentials.callCredentials. Since the TransportChannel.isDirectPath() == true in this case, this PR will make the call options strip the call credentials in it.

So basically, we document that DirectPath does not take in any CallCredentials from the ApiCallContext and we should be good. Is this what you were asking about?

I'm not against this, though I would like to try and see if we can figure a generic solution to tackle sending creds twice, rather than adding one-off fixes for specific flows.

In that case, I wonder if you are open to adding a more generic method to TransportChannel, say hasCallCredentials() which can be used more generically. And we don't have to add any more method for any new cases.

@lqiu96
Copy link
Contributor

lqiu96 commented Mar 3, 2025

ChannelCredentials are not required to always carry the CallCredentials

I see, so in the case of Mtls (non S2A) and just normal TLS, then the expectation is that CallCredentials should be attached via CallOptions? In that case, I think my assumption that always using ChannelCredentials instead of CallOptions is wrong.

We only care about whether CallCredentials have been attached to the channel here.

Ok I see, I think I'm on the same page now. In this case only DP and MTLS_S2A would be the cases as you've mentioned above.

we document that DirectPath does not take in any CallCredentials from the ApiCallContext and we should be good.

Yes I think if we take the opinionated route this what what we should do. The example that I was thinking above was that a user established a DP connection with some SA cred and overrides with a different SA cred via CallOptions (something like what Spanner has: https://github.com/googleapis/java-spanner/blob/7a8a29be40258294cafd13b1df7df5ea349a675d/google-cloud-spanner/src/main/java/com/google/cloud/spanner/spi/v1/GapicSpannerRpc.java#L2037). Is this a behavior that works or would or would it fail/ never worked?

I wonder if you are open to adding a more generic method to TransportChannel

Yeah, this is what the PR above was trying to explore. Adding a new method that determined if the CallCredentials was attached to the channel. When a GAPIC client is initialized, it will try to determine if this was a user set value or set by the client and it will ignore the client set value (but I think I got the logic wrong).

@rockspore rockspore changed the title remove call credentials from call options if DirectPath fix: remove call credentials from call options if DirectPath Mar 3, 2025
@rockspore rockspore marked this pull request as ready for review March 4, 2025 00:00
Comment on lines 566 to 568
if (!isDirectPath) return callOptions;
// Remove the CallCredentials attached to the callOptions if it's DirectPath.
return callOptions.withCallCredentials(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this logic reside inside the constructor? I think it would be better for the getter to just return value.

nit: Can the comment be updated to reflect the why. Perhaps something like (probably needs better wording): CallCredentials is stripped from CallOptions because CallCredentials are attached to ChannelCredentials in DirectPath flows. Adding it again would duplicate the headers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also modify the merge logic to respect the isDirectPath of the coming context. Also several cases I take care of here:

  1. If withChannel is subsequently called, we attach a CallCreds from the creds back to calloptions and revert isDirectPath to false.
  2. If withCallOptions is subsequently called, we strip the CallCreds from it if DirectPath.
  3. If withCredentials is subsequently called, we attach its corresponding CallCreds to the calloptions only if non-DirectPath

Please help check if I missed any corner case.

@lqiu96
Copy link
Contributor

lqiu96 commented Mar 4, 2025

I think the last thing is to update the GrpcCallContext's withCredentials and withCallOptions javadocs.

Thoughts on a small blurb like (re-phrase/ update): For certain flows like DirectPath, the channel is already created with a CallCredentials. Passing in an additional CallCredentials here will not override the ChannelCredential's CallCredential and may cause issues as the CallCredentials is duplicated. These flows will attempt to automatically strip the CallOption's CallCreds.

@rmehta19
Copy link
Contributor

rmehta19 commented Mar 5, 2025

(Not specific to this PR, but related to fixing this problem for MTLS_S2A)

I was wondering if we could introduce a similar API as isDirectPath() for TransportChannel for S2A w/ bound tokens so that the same approach could be used there.

I would like this. I'm not able to easily map a way to determine this since I think it would require actually trying to create the CallCredentials and checking if they could be created.

I think it may be more than this. In order to set isMtlsS2A() (API similar to isDirectPath), we'd need to check:

This is due to the fact that for DirectPath: if you can use DirectPath then you will use DirectPath. However, this is not true for S2A, if you can use S2A, you may end up using S2A (you could end up using DirectPath since DP supersedes S2A or you could end up falling back to TLS if you fail to create S2A channel creds).

I'll try to put together a PR taking this into account.

Edit: Actually I see that #3671 did take this into account. Although I am unclear on how we can make sure createSingleChannel gets called before setting IsCallCredentialAttachedToChannel

@rockspore
Copy link
Member Author

For posterity, we confirmed that CallCredentials attached to the ChannelCredentials will appear before the potentially same header from the CallCredentials in the CallOptions (with the current gRPC implementation) and the server only uses the first one. Example header list:

[:authority: spanner.googleapis.com, :path: /google.spanner.v1.Spanner/BatchCreateSessions, :method: POST, :scheme: https, content-type: application/grpc, te: trailers, user-agent: spanner-java/6.86.0 grpc-java-netty/1.69.0, ..., grpc-accept-encoding: gzip, authorization: Bearer ya29.****, ..., authorization: Bearer 1234, grpc-timeout: 56962080u]

The garbage token in the second authorization header didn't cause the call to fail the authn/z.

In general, we shouldn't rely on this gRPC implementation detail since it could in theory change the appending order and break everything. This PR will fix the duplication header issue so we are good.

return new GrpcCallContext(
transportChannel.getChannel(),
credentials,
transportChannel.isDirectPath() ? callOptions.withCallCredentials(null) : callOptions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant can we do this logic/ check inside the private constructor? This should only pass the calloptions.

Constructor contains the:
this.callOptions = isDirectPath() ? calloptions.withCallCredentials(null) : callOptions logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this makes so much sense! Sorry I don't know what I was thinking. Changed.

@lqiu96 lqiu96 requested a review from blakeli0 March 5, 2025 20:53
@lqiu96
Copy link
Contributor

lqiu96 commented Mar 5, 2025

/gcbrun

@lqiu96 lqiu96 requested a review from zhumin8 March 5, 2025 20:54
@lqiu96
Copy link
Contributor

lqiu96 commented Mar 6, 2025

/gcbrun

@@ -96,6 +96,7 @@ public final class GrpcCallContext implements ApiCallContext {
private final ImmutableMap<String, List<String>> extraHeaders;
private final ApiCallContextOptions options;
private final EndpointContext endpointContext;
private final boolean isDirectPath;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an issue reported that isDirectPath may not be correct populated in some cases. @lqiu96 is looking into it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info. I thought that referred to canUseDirectPath() which could be invoked multiple times before and after the credentials was set so might be no accurate, but by the time the ClientContext got this boolean from the channel, it should already be final.

Copy link
Contributor

@lqiu96 lqiu96 Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with @surbhigarg92 regarding this to get some more information about the issue. This issue from Spanner's POV is not so much that the canUseDirectPath value that is client is initialized with and used by the TransportChannel is incorrect, it's that Spanner doesn't have a way to get the canUseDirectPath value when initializing the client for Otel.

Spanner was using GrpcSpannerStub.create(StubSettings) and didn't have an easy way to access to TransportChannel's fields (StubSettings only exposes getTransportChannelProvider(). Spanner could use GrpcSpannerStub.create(ClientContext), but that creates the Stub with a new StubSettings and not the one they manually configured.

For them, the DirectPath transportchannel was always created correctly and the value the client uses is correct. They used this workaround to be able to access the field for their use case.

I think their issue is a valid concern, but is different from this.

@lqiu96 lqiu96 requested a review from blakeli0 March 7, 2025 17:50
@@ -228,7 +250,8 @@ public GrpcCallContext withEndpointContext(EndpointContext endpointContext) {
options,
retrySettings,
retryableCodes,
endpointContext);
endpointContext,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its unfortunate that we have to change all the places that use the constructor. Ideally, if we had a builder for GrpcCallContext, the code here would be simplified to this.toBuilder().setEndpointContext(endpointContext).build(), and we don't have to change the code here at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. But I'd prefer to leave the refactoring with a builder out of this PR since it would look cleaner. I can open an issue for it. Let me know what you think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. Yes please create a separate issue and we can put it in our backlog.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Created #3681

@blakeli0
Copy link
Collaborator

blakeli0 commented Mar 7, 2025

/gcbrun

@rockspore
Copy link
Member Author

Thanks for all the reviews. No rush but just in case, please help merge it when you see fit since I don't have the permission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants