-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add mysql-shell engine as a supported engine #586
Conversation
Signed-off-by: Renan Rangel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CRD needs to be re-generated, this is achievable by running:
make generate && kustomize build ./deploy > build/_output/operator.yaml
The test code must also be updated. You can take the content of build/_output/operator.yaml
and copy it into ./test/endtoend/operator/operator-latest.yaml
. Note that you will need to revert the changes made to lines 6889
and 6893
before commiting.
Once done, the content of operator-latest.yaml
will need to be copied over to the Vitess repository into the file ./examples/operator/operator.yaml
.
For reviewers, since the PR on vitess is not merged yet, the Docker Images containing the new backup engine are not available yet. We must build and push Docker Images of the Vitess PR to a test registry in order to test this change correctly. |
Signed-off-by: Renan Rangel <[email protected]>
Signed-off-by: Renan Rangel <[email protected]>
@frouioui I can't see the tests, but I imagine they are breaking because of the docker image? I wonder if this looks okay that we can copy the contents of the file back to vitessio/vitess#16295? |
Signed-off-by: Florent Poinsard <[email protected]>
Some extra whitespace were introduced to Otherwise @rvrangel, I think once vitessio/vitess#16295 is merged, we should come back to this Pull Request and add a real E2E test that uses the new backup engine. Such a test already exists (
In the meantime, copying I have tested this Pull Request with a custom Docker build of vitessio/vitess#16295, and here is how it looks:
# ./test/endtoend/operator/101_initial_cluster_backup.yaml
...
spec:
backup:
engine: mysqlshell
locations:
- volume:
hostPath:
path: /backup
type: Directory
images:
vtctld: frouioui/lite:latest
vtgate: frouioui/lite:latest
vttablet: frouioui/lite:latest
vtorc: frouioui/lite:latest
vtbackup: frouioui/lite:latest
mysqld:
mysql80Compatible: mysql:8.0.30
mysqldExporter: prom/mysqld-exporter:v0.11.0
...
...
// pkg/operator/vttablet/backup.go:42
func init() {
vttabletFlags.Add(func(s lazy.Spec) vitess.Flags {
spec := s.(*Spec)
if spec.BackupLocation == nil || spec.Mysqld == nil {
return nil
}
flags := vitess.Flags{
"restore_from_backup": true,
"restore_concurrency": restoreConcurrency,
"wait_for_backup_interval": waitForBackupInterval,
"backup_engine_implementation": string(spec.BackupEngine),
}
switch spec.BackupEngine {
case planetscalev2.VitessBackupEngineXtraBackup:
// When vttablets take backups, we let them keep serving, so we
// limit to single-threaded to reduce the impact.
backupThreads := 1
// When vttablets are restoring, they can't serve at the same time
// anyway, so let the restore use all available CPUs for this Pod.
// This is cheating a bit, since xtrabackup technically counts
// against only the vttablet container, but we allow CPU bursting,
// and we happen to know that our mysqld container is not using its
// CPU request (reservation) during restore since it's stopped.
mysqlCPU := spec.Mysqld.Resources.Requests[corev1.ResourceCPU]
vttabletCPU := spec.Vttablet.Resources.Requests[corev1.ResourceCPU]
restoreThreads := int(mysqlCPU.Value() + vttabletCPU.Value())
if restoreThreads < 1 {
restoreThreads = 1
}
flags.Merge(xtrabackupFlags(spec, backupThreads, restoreThreads))
case planetscalev2.VitessBackupEngineMySQLShell:
svm := vitessbackup.StorageVolumeMounts(spec.BackupLocation)
flags.Merge(vitess.Flags{
"mysql-shell-backup-location": svm[0].MountPath,
})
}
clusterName := spec.Labels[planetscalev2.ClusterLabel]
storageLocationFlags := vitessbackup.StorageFlags(spec.BackupLocation, clusterName)
return flags.Merge(storageLocationFlags)
})
...
Until we fix the hostname/socket issue this is blocked, let me know if you want me to help implementing step 5. |
hey @frouioui I would appreciate the help on step 5 as we don't use the vitess operator and I don't have a lot of context to make that change. Would the change you mentioned be enough (if that's the case please push it to this PR too) or do we need something else to make it work? |
Signed-off-by: Florent Poinsard <[email protected]>
I pushed the fix thru 4d4318c
@rvrangel The change mentioned would not be enough. We will need to implement https://github.com/vitessio/vitess/pull/16295/files#r1686498163 in vitess-operator, this is what's blocking step 7 and the end of the manual test. I will work on this now and update you. |
Signed-off-by: Florent Poinsard <[email protected]>
I fixed how we take a backup in 39d0d7f. Now I am facing issues with the restore phase. In our tests, we bring up a vitess cluster then take a backup on it and tear it down. We then re-create the vitess cluster, the data is restored from the disk and volumes, and shortly after we try to restore from backup as we found an available backup. Since the tables already exist in the destination we are left with some error like: I tried excluding some tables with the load flag
|
@frouioui could you provide a bit more of detail of how it is re-created? when you say the data is restored from disk, do you mean you restore previous snapshots of the data (so tables and schema would already exist on disk) while starting the from the code comments, it seems if you are starting a tablet with existing data, it needs to be wiped out before you start the |
Let's join discussions in vitessio/vitess#16295 (comment) |
@rvrangel, with the latest version on the vitess PR, I am still facing the same issue:
|
@frouioui given the defaults have been changed in here, is it possible you were using an older version of that branch? the restore options on the log provided are:
All databases will be deleted by the restore process on the latest version, including |
Sorry for the oversight, I was not on the right commit. I ran the tests again, and I am facing a different issue. Restoring from a backup fails when starting mysqld, here are the logs: mysqld containerAfter starting correctly, the same snippet of log repeats itself:
vttablet containerIt seems like restoring from the backup with the new engine works correctly, until we try to start mysqld (
It seems like we cannot clean up the stale lock file, because it is not stale and still used by, in this case, process 72. We attempt to do this here and end up failing there. Have you encountered that issue outside of the vitess-operator before? |
ah, I believe we did in fact, but we do have a custom the problem when testing on live tables was that Vitess always assumes it needs to start |
@rvrangel, i will test with the new commit today and come back to you with the results. |
@rvrangel with the latest commit on the Vitess PR, I am able to take a backup and restore from that backup correctly. The exact same test passes with both the In my test, before and after the restore phase I make sure of certain parameters in the database, to do so I connect to the database using the Failed connection using
|
Standard vitess config leaves |
Great to hear that @frouioui !! @deepthi those users are added back by
I mentioned this briefly on this comment, the main problem is that After the restore process is completed, we should have most users in the same state they were when the backup started, with the exception of the |
Signed-off-by: Florent Poinsard <[email protected]>
I took the liberty of pushing another commit to this branch. 134463b adds the grant to the I am not longer seeing any issue with my tests after using the latest version of the Vitess branch, and the fixed I just pushed. |
awesome, thanks @frouioui . I see some tests are failing, but I can't see them, are those expected because we haven't merged the other PR yet? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
The failures seemed unrelated to this PR, I restarted the workflows. |
This change is dependant on vitessio/vitess#16295 (review) as we are adding a new backup engine to Vitess. See the comment for additional context