-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add properties to set universe domain and endpoint in bigquery #3158
Conversation
private String universeDomain; | ||
private String jsonWriterEndpoint; | ||
|
||
private String host; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling out the addition of this property since this is a deviation from our original design of exposing universe domain and endpoint. Apiary libraries such as Bigquery and Storage only allow for customizing host
(which is essentially endpoint without the port) instead of endpoint
so we are doing the same in the Spring Cloud GCP layer. For context: googleapis/sdk-platform-java#2329
bigQueryWriteSettingsBuilder.setUniverseDomain(this.universeDomain); | ||
} | ||
if (this.jsonWriterEndpoint != null) { | ||
bigQueryWriteSettingsBuilder.setEndpoint(this.jsonWriterEndpoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like jsonWriterEndpoint
is only used for BigQueryWriteClient
, which is actually part of bigquery-storage, not bigquery. Ideally I think we should have different auto-configuration for them, but since they are in one auto-configuration, can we either expose host
or endpoint
? And construct the other dynamically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right, it's only used for BigqueryWriteClient (bigquery storage). We can expose endpoint
and strip the port and reformat it to the pattern that DEFAULT_HOST
currently has when setting it to BigqueryOptions.
However, the concern I have with this is that it would prevent us from making endpoint/host configurable on a per-service basis. In fact, looking at the code again, I'm wondering if it was a mistake to make universe domain shared between bigquery and bigquerystorage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we share anything else between Bigquery and Bigquerystorage client? Ideally they should be separate as they are two different services, but looks like we are treating them the same in spring-cloud-gcp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question - you're right, they do seem pretty interconnected in spring-cloud-gcp.
BigqueryTemplate
wraps both Bigquery and BigQueryWriteClient and both services share the spring.cloud.gcp.bigquery.threadPoolSize
property and spring.cloud.gcp.bigquery.datasetName
property. Dataset name is used by the BigtableTemplate#writeJsonStream
method which uses Bigquery to create a table before invoking the BigQuery Storage Write API.
spring.cloud.gcp.bigquery.jsonWriterBatchSize
is the only property that is unique to bigquery storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have two options
- Continue treating bigquery and bigquerystorage the same product. Then we only introduce
universe-domain
, and eitherhost
orendpoint
. - Start treating them two different product. Then we would introduce
universe-domain
andhost
for bigquery,universe-domain
andendpoint
for bigquerystorage.
Any preferences? cc: @burkedavison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, considering that both bigquery and bigquerystorage are wrapped as one product in spring-cloud-gcp, we will expose global endpoint and universe domain properties. The port will be omitted from the endpoint when set to the bigquery client.
ctx -> { | ||
BigQueryWriteClient client = ctx.getBean(BigQueryWriteClient.class); | ||
assertThat(client.getSettings().getEndpoint()) | ||
.isEqualTo("bigquerystorage.example.com:123"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please verify that it's intended to have getEndpoint not return the URI scheme, but getResolvedApiaryHost does return the URI scheme.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, returning endpoint with this pattern is intended for Bigquerystorage which will otherwise throw an exception if it doesn't follow a specific convention. For example, passing in bigquerystorage.example.com
without the port will result in:
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.google.cloud.bigquery.storage.v1.BigQueryWriteClient]: Factory method 'bigQueryWriteClient' threw exception with message: invalid endpoint, expecting "<host>:<port>"
at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:178)
at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:644)
... 31 more
Caused by: java.lang.IllegalArgumentException: invalid endpoint, expecting "<host>:<port>"
On the other hand bigquery accepts either accepts user-provided endpoint as-is if one is provided or builds the host in the https://service.universeDomain
format if only the universe domain is provided:
https://github.com/googleapis/sdk-platform-java/blob/1a988df22f7e3d15ce6b121bf26897c59ab468e4/java-core/google-cloud-core/src/main/java/com/google/cloud/ServiceOptions.java#L868 but doesn't contain an explicit check to verify this schema. And in this PR, we reformat the provided endpoint to follow the same pattern that getResolvedApiaryHost() defaults to.
} | ||
URI uri = new URI(endpoint); | ||
|
||
// Construct the new URL with https:// and no port |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just do String manipulation is host is basically endpoint without ports? Is there any edge cases can not be handled by String manipulation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Simplified to use String manipulation.
Quality Gate passedIssues Measures |
This PR introduces the following changes:
spring.cloud.gcp.bigquery.universe-domain
property to customize the universe domain inBigqueryOptions
(option to customize endpoint is not provided in the client library) andBigQueryWriteClient
if these values aren'tnull
.spring.cloud.gcp.bigquery.endpoint
property to customize the endpoint and host inBigQueryWriteClient
and if this value isn'tnull
. Since Biqguery acceptshost
instead ofendpoint
, the endpoint is formatted to thehttps://service.universeDomain
format before being set to Bigquery.GcpBigQueryAutoConfigurationTests
verify that the client settings have been correctly set.End-to-end verification with test environment endpoints have been documented in a separate test plan document.