Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttler multi-metrics: proto changes #16040

Conversation

shlomi-noach
Copy link
Contributor

Description

This PR is the first in the series of incremental changes breaking down #16039 into smaller parts.

We begin by applying the proto changes.

Notable:

  • The CheckThrottler gRPC call already exists in v19 and before, and is/was used by the Primary throttler to probe its replicas, and by the replica throttler to notify the Primary about recent checks.
  • Here, we enhance CheckThrottlerRequest and CheckThrottlerResponse with more fields. We will later make CheckThrottler a vtctidlclient command. A check requests identifies by app name, and now also clearly indicates whether multi-metrics are supported/desired (important for backwards compatibility and cross-version interaction), as well as other flags that will be come clearer later on.
  • GetThrottlerStatus, GetThrottlerStatusRequest, GetThrottlerStatusResponse are new calls, and useful for debugging/incident analysis. These return the equivalent of /throttler/status, but now via gRPC instead of HTTP. Likewise, this will be implemented as a vtctldclient in a followup PR.
  • ThrottlerConfig now supports:
    • AppCheckedMetrics - mapping of app->metrics (meaning which metrics the throttler checks on behalf of given app)
    • MetricThresholds - per metric threshold (different threshold for lag, for threads_running, etc.)

See the docs in #16039 main comment for more.

This PR merges into vitessio:throttler-multi-metrics-incremental, which is a Draft PR, and not into main.

https://github.com/planetscale/vitess/compare/vitessio:vitess:throttler-multi-metrics-incremental...planetscale:vitess:throttler-multi-metrics-incremental-proto?expand=1

Related Issue(s)

#15988

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Throttler labels Jun 2, 2024
Copy link
Contributor

vitess-bot bot commented Jun 2, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Jun 2, 2024
@github-actions github-actions bot added this to the v20.0.0 milestone Jun 2, 2024
@shlomi-noach shlomi-noach removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Jun 2, 2024
Copy link

codecov bot commented Jun 2, 2024

Codecov Report

Attention: Patch coverage is 0% with 12 lines in your changes missing coverage. Please review.

Project coverage is 68.55%. Comparing base (135a6a8) to head (5e2ee27).

Files Patch % Lines
go/vt/vtctl/grpcvtctldclient/client_gen.go 0.00% 8 Missing ⚠️
go/vt/vtctl/localvtctldclient/client_gen.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16040      +/-   ##
==========================================
- Coverage   68.56%   68.55%   -0.02%     
==========================================
  Files        1544     1544              
  Lines      197873   197885      +12     
==========================================
- Hits       135670   135653      -17     
- Misses      62203    62232      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

proto/tabletmanagerdata.proto Outdated Show resolved Hide resolved
proto/tabletmanagerdata.proto Outdated Show resolved Hide resolved
proto/tabletmanagerdata.proto Show resolved Hide resolved
proto/vtctldata.proto Show resolved Hide resolved
proto/vtctldata.proto Outdated Show resolved Hide resolved
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only had a couple of minor comments/questions.

bool ok_if_not_exists = 4;
// MultiMetricsEnabled is always set to "true" and is how a multi-metrics enabled replica
// throttler knows its being probed by a multi-metrics enabled primary vttablet.
bool multi_metrics_enabled = 5;
Copy link
Contributor

@mattlord mattlord Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's true, why do we need it? The zero value will be false so we'll have to be sure we're setting it to true everywhere. When should it be false? Is it related to backwards compatibility, or mixed tablet versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it related to backwards compatibility,
Yes! As per the comment to this variable which I could probably rephrase better.

... and is how a multi-metrics enabled replica
// throttler knows its being probed by a multi-metrics enabled primary vttablet.

Meaning a "new" replica would respond like an "old" replica would, if this boolean is false, because that indicates an "old" primary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag will be completely removed in v22.

// Scope used in this check
string scope = 7;
}
map<string, Metric> metrics = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the key here? Can you please add a comment on that? Maybe it's the tablet UID or something (the larger PR probably answers this). If it's name, however, then we could remove name from the Metric message itself.

Copy link
Contributor Author

@shlomi-noach shlomi-noach Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments in code.

  // Metrics is a map (metric name -> metric value/error) so that the client has as much
  // information as possible about all the checked metrics.

message GetThrottlerStatusRequest {
}

message GetThrottlerStatusResponse {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! So I think this is a migration of the HTTP status endpoint to an RPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct!

Comment on lines +807 to +808
vttime.Time last_healthy_at = 1;
int64 seconds_since_last_healthy = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a little redundant to me, but fine as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's redundant -- but great for metrics! Which is why I added it. This represents really well as a gauge.

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

Change of plans. By popular demand, rebasing this against main. We've just cut from 20.0 to 21.0 and have all the time in the world to make changes. #15988 will then rebase on the merged changes.

@shlomi-noach shlomi-noach changed the base branch from throttler-multi-metrics-incremental to main June 17, 2024 15:22
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

Rebased on main, resolved conflicts, will merge once tests are green.

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach merged commit 84976b1 into vitessio:main Jun 17, 2024
96 checks passed
@shlomi-noach shlomi-noach deleted the throttler-multi-metrics-incremental-proto branch June 17, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Throttler Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants