Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add properties to set universe domain and endpoint in bigquery #3158

Merged
merged 12 commits into from
Sep 26, 2024

Conversation

mpeddada1
Copy link
Contributor

@mpeddada1 mpeddada1 commented Aug 27, 2024

This PR introduces the following changes:

  • Introduces spring.cloud.gcp.bigquery.universe-domain property to customize the universe domain in BigqueryOptions (option to customize endpoint is not provided in the client library) and BigQueryWriteClient if these values aren't null.
  • Introduces spring.cloud.gcp.bigquery.endpoint property to customize the endpoint and host in BigQueryWriteClient and if this value isn't null. Since Biqguery accepts host instead of endpoint, the endpoint is formatted to the https://service.universeDomain format before being set to Bigquery.
  • Adds unit tests in GcpBigQueryAutoConfigurationTests verify that the client settings have been correctly set.

End-to-end verification with test environment endpoints have been documented in a separate test plan document.

@mpeddada1 mpeddada1 marked this pull request as ready for review September 16, 2024 16:51
@mpeddada1 mpeddada1 requested a review from a team as a code owner September 16, 2024 16:51
private String universeDomain;
private String jsonWriterEndpoint;

private String host;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling out the addition of this property since this is a deviation from our original design of exposing universe domain and endpoint. Apiary libraries such as Bigquery and Storage only allow for customizing host (which is essentially endpoint without the port) instead of endpoint so we are doing the same in the Spring Cloud GCP layer. For context: googleapis/sdk-platform-java#2329

@burkedavison @blakeli0

bigQueryWriteSettingsBuilder.setUniverseDomain(this.universeDomain);
}
if (this.jsonWriterEndpoint != null) {
bigQueryWriteSettingsBuilder.setEndpoint(this.jsonWriterEndpoint);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like jsonWriterEndpoint is only used for BigQueryWriteClient, which is actually part of bigquery-storage, not bigquery. Ideally I think we should have different auto-configuration for them, but since they are in one auto-configuration, can we either expose host or endpoint? And construct the other dynamically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, it's only used for BigqueryWriteClient (bigquery storage). We can expose endpoint and strip the port and reformat it to the pattern that DEFAULT_HOST currently has when setting it to BigqueryOptions.

However, the concern I have with this is that it would prevent us from making endpoint/host configurable on a per-service basis. In fact, looking at the code again, I'm wondering if it was a mistake to make universe domain shared between bigquery and bigquerystorage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we share anything else between Bigquery and Bigquerystorage client? Ideally they should be separate as they are two different services, but looks like we are treating them the same in spring-cloud-gcp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question - you're right, they do seem pretty interconnected in spring-cloud-gcp.
BigqueryTemplate wraps both Bigquery and BigQueryWriteClient and both services share the spring.cloud.gcp.bigquery.threadPoolSize property and spring.cloud.gcp.bigquery.datasetName property. Dataset name is used by the BigtableTemplate#writeJsonStream method which uses Bigquery to create a table before invoking the BigQuery Storage Write API.

spring.cloud.gcp.bigquery.jsonWriterBatchSize is the only property that is unique to bigquery storage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have two options

  1. Continue treating bigquery and bigquerystorage the same product. Then we only introduce universe-domain, and either host or endpoint.
  2. Start treating them two different product. Then we would introduce universe-domain and host for bigquery, universe-domain and endpoint for bigquerystorage.
    Any preferences? cc: @burkedavison

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, considering that both bigquery and bigquerystorage are wrapped as one product in spring-cloud-gcp, we will expose global endpoint and universe domain properties. The port will be omitted from the endpoint when set to the bigquery client.

ctx -> {
BigQueryWriteClient client = ctx.getBean(BigQueryWriteClient.class);
assertThat(client.getSettings().getEndpoint())
.isEqualTo("bigquerystorage.example.com:123");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please verify that it's intended to have getEndpoint not return the URI scheme, but getResolvedApiaryHost does return the URI scheme.

LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, returning endpoint with this pattern is intended for Bigquerystorage which will otherwise throw an exception if it doesn't follow a specific convention. For example, passing in bigquerystorage.example.com without the port will result in:

Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.google.cloud.bigquery.storage.v1.BigQueryWriteClient]: Factory method 'bigQueryWriteClient' threw exception with message: invalid endpoint, expecting "<host>:<port>"
	at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:178)
	at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:644)
	... 31 more
Caused by: java.lang.IllegalArgumentException: invalid endpoint, expecting "<host>:<port>"

On the other hand bigquery accepts either accepts user-provided endpoint as-is if one is provided or builds the host in the https://service.universeDomain format if only the universe domain is provided:
https://github.com/googleapis/sdk-platform-java/blob/1a988df22f7e3d15ce6b121bf26897c59ab468e4/java-core/google-cloud-core/src/main/java/com/google/cloud/ServiceOptions.java#L868 but doesn't contain an explicit check to verify this schema. And in this PR, we reformat the provided endpoint to follow the same pattern that getResolvedApiaryHost() defaults to.

}
URI uri = new URI(endpoint);

// Construct the new URL with https:// and no port
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just do String manipulation is host is basically endpoint without ports? Is there any edge cases can not be handled by String manipulation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Simplified to use String manipulation.

Copy link

@mpeddada1 mpeddada1 requested a review from blakeli0 September 26, 2024 14:38
@mpeddada1 mpeddada1 merged commit 9b3c780 into main Sep 26, 2024
69 of 76 checks passed
@mpeddada1 mpeddada1 deleted the bigquery-universeDomain branch September 26, 2024 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants