Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make vttablet wait for vt_dba user to be granted privileges #14565

Merged
merged 15 commits into from
Nov 28, 2023

Conversation

GuptaManan100
Copy link
Member

@GuptaManan100 GuptaManan100 commented Nov 21, 2023

Description

This PR adds a wait to the vttablet startup. The init_db.sql file is not guaranteed to have completed applying before the vttablet initializes because vttablet and mysqlctl could have started parallely. This causes the issue described in #14562. So, during vttablet startup, we wait for the dba user to have access to the SUPER privilege.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on the CI
  • Documentation was added or is not required

Deployment Notes

@GuptaManan100 GuptaManan100 added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Cluster management labels Nov 21, 2023
Copy link
Contributor

vitess-bot bot commented Nov 21, 2023

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Nov 21, 2023
@github-actions github-actions bot added this to the v19.0.0 milestone Nov 21, 2023
go/test/endtoend/utils/mysql.go Outdated Show resolved Hide resolved
go/vt/mysqlctl/mycnf.go Outdated Show resolved Hide resolved
go/vt/mysqlctl/mycnf.go Outdated Show resolved Hide resolved
@@ -273,4 +308,5 @@ func init() {
Main.Flags().DurationVar(&tableACLConfigReloadInterval, "table-acl-config-reload-interval", tableACLConfigReloadInterval, "Ticker to reload ACLs. Duration flag, format e.g.: 30s. Default: do not reload")
Main.Flags().StringVar(&tabletPath, "tablet-path", tabletPath, "tablet alias")
Main.Flags().StringVar(&tabletConfig, "tablet_config", tabletConfig, "YAML file config for tablet")
Main.Flags().DurationVar(&dbaGrantWaitTime, "dba-grant-wait-time", dbaGrantWaitTime, "Time to wait for dba user to be granted the required permissions. Setting the value to 0 disable the wait.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm iffy about adding two new flags for this. Do we have an idea of what a good default wait time would be in the specific issue you are trying to address?

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the tests are failing. It will be interesting to see what those failures reveal.

@deepthi deepthi requested a review from dbussink November 21, 2023 17:26
@GuptaManan100
Copy link
Member Author

The test failures are all using MysQL 5.7. Unit test and also the end to end tests.
I looked at the output for SHOW GRANTS and it is different for both the releases -

mysql [localhost:8032] {msandbox} ((none)) > show grants\G
*************************** 1. row ***************************
Grants for msandbox@localhost: GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, SHUTDOWN, PROCESS, FILE, REFERENCES, INDEX, ALTER, SHOW DATABASES, SUPER, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION SLAVE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER, CREATE TABLESPACE, CREATE ROLE, DROP ROLE ON *.* TO `msandbox`@`localhost`
*************************** 2. row ***************************
Grants for msandbox@localhost: GRANT APPLICATION_PASSWORD_ADMIN,AUDIT_ABORT_EXEMPT,AUDIT_ADMIN,AUTHENTICATION_POLICY_ADMIN,BACKUP_ADMIN,BINLOG_ADMIN,BINLOG_ENCRYPTION_ADMIN,CLONE_ADMIN,CONNECTION_ADMIN,ENCRYPTION_KEY_ADMIN,FIREWALL_EXEMPT,FLUSH_OPTIMIZER_COSTS,FLUSH_STATUS,FLUSH_TABLES,FLUSH_USER_RESOURCES,GROUP_REPLICATION_ADMIN,GROUP_REPLICATION_STREAM,INNODB_REDO_LOG_ARCHIVE,INNODB_REDO_LOG_ENABLE,PASSWORDLESS_USER_ADMIN,PERSIST_RO_VARIABLES_ADMIN,REPLICATION_APPLIER,REPLICATION_SLAVE_ADMIN,RESOURCE_GROUP_ADMIN,RESOURCE_GROUP_USER,ROLE_ADMIN,SENSITIVE_VARIABLES_OBSERVER,SERVICE_CONNECTION_ADMIN,SESSION_VARIABLES_ADMIN,SET_USER_ID,SHOW_ROUTINE,SYSTEM_USER,SYSTEM_VARIABLES_ADMIN,TABLE_ENCRYPTION_ADMIN,XA_RECOVER_ADMIN ON *.* TO `msandbox`@`localhost`
*************************** 3. row ***************************
Grants for msandbox@localhost: GRANT `R_DO_IT_ALL`@`%` TO `msandbox`@`localhost`
3 rows in set (0.01 sec)

and

mysql [localhost:5731] {msandbox} ((none)) > show grants\G
*************************** 1. row ***************************
Grants for msandbox@localhost: GRANT ALL PRIVILEGES ON *.* TO 'msandbox'@'localhost'
1 row in set (0.00 sec)

In mysql 5.7 we they don't list all the privileges, but just GRANT ALL PRIVILEGES...

@GuptaManan100 GuptaManan100 removed the NeedsIssue A linked issue is missing for this Pull Request label Nov 22, 2023
@deepthi
Copy link
Member

deepthi commented Nov 22, 2023

We definitely need the DBA grants part of this, because we don't want vttablet to start populating the DBA pool with connections where the right grants are not yet present.
However, the mycnf part of it we may be able to skip. The reason for this is that in k8s environments, the principle we follow is that each component comes up, checks if it can continue start up, dies if not, and restarts.
So what we actually see happening with vttablet if my.cnf is not available is an error like this

vttablet.go:171] mycnf read failed: open /vt/vtdataroot/vt_0000000100/my.cnf: no such file or directory

This is a fatal error, so vttablet shuts down and restarts. Usually by the time it comes up again, my.cnf file is available and the end result is something like the following in the pod status.

mysqlctld container:

    State:          Running
      Started:      Tue, 21 Nov 2023 11:45:13 -0800

vttablet container:

    State:          Running
      Started:      Tue, 21 Nov 2023 11:45:15 -0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 21 Nov 2023 11:45:13 -0800
      Finished:     Tue, 21 Nov 2023 11:45:13 -0800
    Ready:          True
    Restart Count:  1

h/t @dctrwatson for explaining this to me.

This also suggests an alternate implementation for the DBA grants issue. Instead of waiting for the grants to be available, we check whether they are present, and if they aren't we simply exit / shutdown with an error. And that can happen as many times as needed until the check succeeds.

@GuptaManan100
Copy link
Member Author

@deepthi So what do we want to do? Remove the flags with a default value? Remove the wait for the file to show up because vttablet would crash anyways?

@GuptaManan100 GuptaManan100 removed the NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work label Nov 24, 2023
@dbussink
Copy link
Contributor

dbussink commented Nov 24, 2023

And that can happen as many times as needed until the check succeeds.

The main issue with this approach is that you still get a bunch of errors logged and pod restarts. While that's not considered a problem normally in the k8s world, I think it can still be pretty confusing and it's also not really optimal in how quickly the system comes up (Vitess waiting for a bit is going to be faster than k8s noticing the failure and restarting).

A bit of resiliency like this in the PR with a (maybe even shorter) timeout would still be a nice improvement imho to improve also the operational experience (next to fixing the actual bug here of course).

@GuptaManan100
Copy link
Member Author

I'll move the mycnf file wait into a separate PR and hard-code the dba-grants wait to 10seconds so that we can get this bug fix in. WDYT?

@deepthi
Copy link
Member

deepthi commented Nov 27, 2023

I'll move the mycnf file wait into a separate PR and hard-code the dba-grants wait to 10seconds so that we can get this bug fix in. WDYT?

Sounds good.

@deepthi deepthi removed the NeedsWebsiteDocsUpdate What it says label Nov 27, 2023
@deepthi deepthi changed the title Fix Vttablet by making it wait for the mycnf file and vt_dba user to be granted the privileges Make vttablet wait for vt_dba user to be granted privileges Nov 27, 2023
go/cmd/vttablet/cli/cli.go Outdated Show resolved Hide resolved
go/cmd/vttablet/cli/cli.go Outdated Show resolved Hide resolved
@dbussink dbussink merged commit f979961 into vitessio:main Nov 28, 2023
116 checks passed
@dbussink dbussink deleted the vttablet-mysqlctl-race branch November 28, 2023 09:10
return nil
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add logs for the err != nil cases?

Copy link
Member Author

@GuptaManan100 GuptaManan100 Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this review comment - #14632

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Cluster management Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants