-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: empty ListShardTablet result when remote cell topology is down #13801
Comments
https://github.com/vitessio/vitess/blob/v19.0.3/go/vt/topo/tablet.go#L553 If endpoint does not have cached etcd connection, then timout of 5s is used to connect to etcd and fail early and then endpoint returns partial results |
Can you enumerate which commands exactly you used to test this on v19? The old |
Just slightly modified tests provided in reproduction steps are used. The code is collapsed in issue definition. // Service Vtctl allows you to call vt commands through gRPC.
service Vtctl {
rpc ExecuteVtctlCommand (vtctldata.ExecuteVtctlCommandRequest) returns (stream vtctldata.ExecuteVtctlCommandResponse) {};
} What is basically happening in test:
Some of our apps (rails/go) use Vtctld GRPC endpoint to discover cluster metadata. In case cell is down, this command fails and returns empty results. I don't see how |
My question was specifically about v19. Because you linked to code from that version. I thought that you must have tested that version. Docs for GetTablets: |
@DeathBorn to be clear, Now, having said all of that... the behavior you noted is an issue that exists with the current client code and RPCs on Did you have anything specific in mind? Using |
In looking at this:
It seems to me that by default ( In looking at the server side code, we should indeed already log a warning and return partial results by default: vitess/go/vt/vtctl/grpcvtctldserver/server.go Lines 2075 to 2255 in 1fd6d90
So I'll have to work on a test that is based on |
So far, the new tests have confirmed that we do actually show partial results by default: #15829 I'll add some additional ones but assuming they all confirm that we're already doing what you want in v14+, I think that we should close this as fixed when that PR with new tests is merged. Please let me know if you feel that I'm missing or misunderstanding anything @DeathBorn . |
@DeathBorn a related aside/FYI, in v19+ the "new vtctl" is https://vitess.io/docs/19.0/reference/programs/vtctldclient/#synopsis You'll then have a |
@mattlord We have some non golang services which depend on vtctld grpc api to perform some actions. Vtctld does not expose the "new" api. Do you have plans to change it ? |
@DeathBorn I don't think I understand. Perhaps you're not including it in the
|
@DeathBorn Here's an example usage:
I'm happy to make some changes related to this if they are needed, but I'm not sure what the issue may be. |
🤦 exactly this.... I have this We can close with additional tests added. 🙇 |
Overview of the Issue
When remote cell topology is unreachable (example: in case of cutting internet cables),
vtctl
binary andvtctlclient
/vtctld GRPC api
return different results.Example test case provided with 2 tablets in different cells. When second tablet cell topology is down,
vtctl
would return only available cell tabletvtctlclient
/vtctld GRPC api
would takeRemoteOperationTimeout
long to execute and nothing would be returnedWe should return at leat available cell tablets. In this case, many applications which rely on GRPC results might brake in unexpected ways.
Reproduction Steps
Unit tests
unit test file placed here
go/vt/vtctld/api2_test.go
Unit test results
This can be reproduced on `v11` and on current master branch (a30b19e28b2aea7a98a2909a4d1a73b822251e66)
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: