Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add max_batching_delay option to CommitRequest. #11939

Closed
wants to merge 8 commits into from

Conversation

nginsberg-google
Copy link
Contributor

@nginsberg-google nginsberg-google commented Jun 21, 2023

This change is Reviewable

@nginsberg-google nginsberg-google requested a review from a team as a code owner June 21, 2023 23:16
@snippet-bot
Copy link

snippet-bot bot commented Jun 21, 2023

Here is the summary of changes.

You are about to add 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@nginsberg-google nginsberg-google changed the title Add max_batching_delay option to CommitRequest. feat: Add max_batching_delay option to CommitRequest. Jun 21, 2023
@devbww
Copy link
Contributor

devbww commented Jun 21, 2023

/gcbrun

@@ -70,6 +71,16 @@ class CommitOptions {
return request_priority_;
}

CommitOptions& set_max_batching_delay(
absl::optional<absl::Duration> max_batching_delay) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use absl::Duration in a public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed to type int and renamed to max_batching_delay_ms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. As @coryan pointed out, "should not use absl::Duration in a public API" means to use std::chrono::duration instead.

@@ -3802,6 +3802,16 @@ void QueryInformationSchemaDatabaseOptions(
}
// [END spanner_query_information_schema_database_options]

// [START spanner_set_batching_delay]
void CommitWithBatchingDelay(google::cloud::spanner::Client client,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example does not show me anything that I did not know from reading the prototype. Do we really need it?

Can we change it to do something less obvious?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, batching_delay is unused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines 21 to 22
#include "absl/types/optional.h"
#include "absl/time/time.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the checkers build failure for problems with include ordering and indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look at that build failure it appears empty. When I click "see more information" it looks like I don't have permission. Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other checkers failures too. Let me know if you still can't see them and we can try to fix that and/or quote them for you.

@@ -1017,6 +1017,12 @@ StatusOr<spanner::CommitResult> ConnectionImpl::CommitImpl(
request.mutable_request_options()->set_priority(
ProtoRequestPriority(params.options.request_priority()));

if (params.options.max_batching_delay().has_value()) {
*request.mutable_max_batching_delay() =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no max_batching_delay field in the published CommitRequest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is currently hidden by a TRUSTED_TESTER label. My impression was the order in which I was supposed to do things was:

  1. Get this PR (and an equivalent one for other client libraries) approved
  2. Remove TT label
  3. Merge these PRs.

I have changed my version of this codebase to include the newly generated code when the TT label is removed, but do not have permission to add it to a branch on this codebase. Please let me know if I'm doing something wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sequence will not work for C++. For complicated reasons1, the C++ clients can only compile against public protos. In this case the public version of CommitRequest does not have the field you want to use.

Footnotes

  1. The C++ client libraries cannot enforce what version of Protobuf our customers will use, and the generated *.pb.h and *.pb.cc files can only work with the version of Protobuf that generated them. Therefore, these files are not included in the Cloud C++ source, the customer generates these files, and the customer only has access to the public proto files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe I'm missing something. Step (2) in the sequence I listed will make the CommitRequest proto contain this field. Once we have a PR for each client library that the client library team is happy with, I will remove the visibility label internally and then the public version of CommitRequest will contain the max_batching_delay field. At that point we will be able to merge this PR. Is there a different sequence of steps the client library team usually takes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need at least:

2.5) Update the googleapis SHA in this repository to use a newer version.
2.6) Get the code to actually compile and pass under CI.

Is there a different sequence of steps the client library team usually takes?

I usually don't start writing code in C++ until the protos are published. I think @devbww has created patches for the proto files and then tweaks the build scripts to apply these patches (there are placeholders in CMake and Bazel to support this). For obvious reasons that work happens in branches that do not get published to PRs or any public location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @coryan points out, we really need the new protos published publicly, and google-cloud-cpp reconfigured to use the new protos, before worrying too much about actually using those new protos.

That said, we're happy to comment on the state of the PR, but having it in a non-compilable state make things more difficult.

@@ -3802,6 +3802,16 @@ void QueryInformationSchemaDatabaseOptions(
}
// [END spanner_query_information_schema_database_options]

// [START spanner_set_batching_delay]
void CommitWithBatchingDelay(google::cloud::spanner::Client client,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, batching_delay is unused.

void CommitWithBatchingDelay(google::cloud::spanner::Client client,
google::cloud::spanner::Mutations mutations,
absl::Duration batching_delay) {
google::cloud::spanner::CommitOptions options;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommitOptions is a deprecated part of the public API. Whatever is needed here should be expressed in terms of google::cloud::Options, and then by calling another Commit() overload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've done this. This field is much like RequestPriorityOption, and I've pretty much copied it. I don't think we need to overload Commit() again, as the current versions all take an Options field. Please take a look and let me know if I've messed this up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the current overloads take an Options argument, not a CommitOptions. The latter is deprecated, and we shouldn't be changing it. Whatever is needed here should only be expressed in terms of google::cloud::Options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry ... my "only" was too strict. We still need your modifications to CommitOptions as they are used internally (and for mocking). But we don't want to use CommitOptions as part of any sample. Instead, the sample should use MaxBatchingDelayOption directly.

@@ -3802,6 +3802,16 @@ void QueryInformationSchemaDatabaseOptions(
}
// [END spanner_query_information_schema_database_options]

// [START spanner_set_batching_delay]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tag does not appear in the published list. If we're making up a new one, it should at least include "commit".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the tag. This is new - is there somewhere I need to add it that I've missed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, it looks like you only updated the START tag, and not the END one.

But to the bigger issue, the point of these tags is to delimit regions that will be scraped into the Cloud Spanner documentation, so introducing a tag requires some cross-team coordination. For example, here are all the defined Spanner tags containing "commit": http://devrel/snippets?pageSize=100&sort=region_tag&direction=asc&product_id=9&region_tag=%25commit%25

Normally the tech writers would introduce the tag when they document the feature, otherwise it has no purpose.

@@ -3802,6 +3802,16 @@ void QueryInformationSchemaDatabaseOptions(
}
// [END spanner_query_information_schema_database_options]

// [START spanner_set_batching_delay]
void CommitWithBatchingDelay(google::cloud::spanner::Client client,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The samples should be executed, but no one calls this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@coryan
Copy link
Contributor

coryan commented Jul 21, 2023

@nginsberg-google do you think you will have time to work on this PR? If not, I think we will just close it.

@nginsberg-google
Copy link
Contributor Author

Yes, I will still work on this PR. Sorry for the delay. Please don't close it :)

Copy link
Contributor Author

@nginsberg-google nginsberg-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the detailed review. This is now ready for rereview - PTAL. Thanks!

Comment on lines 21 to 22
#include "absl/types/optional.h"
#include "absl/time/time.h"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look at that build failure it appears empty. When I click "see more information" it looks like I don't have permission. Am I missing something?

@@ -70,6 +71,16 @@ class CommitOptions {
return request_priority_;
}

CommitOptions& set_max_batching_delay(
absl::optional<absl::Duration> max_batching_delay) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, changed to type int and renamed to max_batching_delay_ms.

@@ -1017,6 +1017,12 @@ StatusOr<spanner::CommitResult> ConnectionImpl::CommitImpl(
request.mutable_request_options()->set_priority(
ProtoRequestPriority(params.options.request_priority()));

if (params.options.max_batching_delay().has_value()) {
*request.mutable_max_batching_delay() =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is currently hidden by a TRUSTED_TESTER label. My impression was the order in which I was supposed to do things was:

  1. Get this PR (and an equivalent one for other client libraries) approved
  2. Remove TT label
  3. Merge these PRs.

I have changed my version of this codebase to include the newly generated code when the TT label is removed, but do not have permission to add it to a branch on this codebase. Please let me know if I'm doing something wrong.

@@ -3802,6 +3802,16 @@ void QueryInformationSchemaDatabaseOptions(
}
// [END spanner_query_information_schema_database_options]

// [START spanner_set_batching_delay]
void CommitWithBatchingDelay(google::cloud::spanner::Client client,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -3802,6 +3802,16 @@ void QueryInformationSchemaDatabaseOptions(
}
// [END spanner_query_information_schema_database_options]

// [START spanner_set_batching_delay]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the tag. This is new - is there somewhere I need to add it that I've missed?

@@ -3802,6 +3802,16 @@ void QueryInformationSchemaDatabaseOptions(
}
// [END spanner_query_information_schema_database_options]

// [START spanner_set_batching_delay]
void CommitWithBatchingDelay(google::cloud::spanner::Client client,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

void CommitWithBatchingDelay(google::cloud::spanner::Client client,
google::cloud::spanner::Mutations mutations,
absl::Duration batching_delay) {
google::cloud::spanner::CommitOptions options;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've done this. This field is much like RequestPriorityOption, and I've pretty much copied it. I don't think we need to overload Commit() again, as the current versions all take an Options field. Please take a look and let me know if I've messed this up.

Comment on lines +3957 to +3964
Options ops;
ops.set<MaxBatchingDelayMsOption>(100);
auto commit_result = client.Commit(spanner::Mutations{
spanner::UpdateMutationBuilder("Albums",
{"SingerId", "AlbumId", "MarketingBudget"})
.EmplaceRow(1, 1, 100000)
.EmplaceRow(2, 2, 500000)
.Build()}, ops);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Options ops;
ops.set<MaxBatchingDelayMsOption>(100);
auto commit_result = client.Commit(spanner::Mutations{
spanner::UpdateMutationBuilder("Albums",
{"SingerId", "AlbumId", "MarketingBudget"})
.EmplaceRow(1, 1, 100000)
.EmplaceRow(2, 2, 500000)
.Build()}, ops);
auto commit_result = client.Commit(spanner::Mutations{
spanner::UpdateMutationBuilder("Albums",
{"SingerId", "AlbumId", "MarketingBudget"})
.EmplaceRow(1, 1, 100000)
.EmplaceRow(2, 2, 500000)
.Build()},
google::cloud::Options{}.set<spanner::MaxBatchDelayOption>(std::chrono::milliseconds(100));

Comment on lines +240 to +243
struct MaxBatchingDelayMsOption {
using Type = int;
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we would want to use int to represent time intervals:

Suggested change
struct MaxBatchingDelayMsOption {
using Type = int;
};
struct MaxBatchingDelayOption {
using Type = std::chrono::milliseconds;
};

@@ -70,6 +70,16 @@ class CommitOptions {
return request_priority_;
}

CommitOptions& set_max_batching_delay_ms(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devbww do we want to keep adding methods to this class? Can we support this new feature only via google::cloud::Options{}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No to spanner::CommitOptions. Yes, to google::cloud::Options.

@devbww
Copy link
Contributor

devbww commented Jul 26, 2023

/gcbrun

@coryan
Copy link
Contributor

coryan commented Jul 27, 2023

FWIW, most builds are useless in this case. Except for this:

https://github.com/googleapis/google-cloud-cpp/pull/11939/checks?check_run_id=15378864615

@coryan
Copy link
Contributor

coryan commented Nov 30, 2023

This PR is looking really stale. What is the next step here?

@devbww
Copy link
Contributor

devbww commented Nov 30, 2023

This PR is looking really stale. What is the next step here?

CommitRequest.max_batching_delay is still unpublished. Indeed, the internal version now carries a "This field will be removed" warning, seemingly in favor of a new (also unpublished) max_commit_delay field.

So, at best this seems obsolete, and should be closed.

@coryan
Copy link
Contributor

coryan commented Nov 30, 2023

Closing for now. It can be reopened if needed or the same branch can be used to push a new PR.

@coryan coryan closed this Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants