feat: support redis(standalone replica) pitr #961

Chiwency · 2024-08-19T09:30:07Z

We leverage the AOF backup feature from Redis version 7.0 onwards to implement Point-In-Time Recovery (PITR). By enabling the aof-timestamp-enabled yes and aof-disable-auto-gc no parameters, we activate AOF timestamp annotations while disabling the automatic cleanup of historical AOF files. A StatefulSet, which is continuously responsible for the ongoing backup process, manages and tracks the AOF files.

The continuous backup process works as follows:

Historical AOF files in the data directory are compressed, packaged, and pushed to the backup repository.
The latest AOF file is backed up to the repository directory and continuously updated. When Redis triggers AOF rewriting, the file is compressed and archived as described in step 1.
The backup files in the repository are named using the format ${base_file_ctime}.${seq}.${suffix}, where base_file_ctime is the creation time of the base aof file, marking the start of this particular backup package. This timestamp creates a continuous timeline for all backups. The seq is a sequence number used for recovery in case of a backup pod failure.

The recovery process works as follows:

Based on DP_RESTORE_TIME, the backup package with the base_file_ctime closest to DP_RESTORE_TIME is selected. This base file's data is used as the starting point. Then, AOF files are processed according to their timestamp annotations , up until DP_RESTORE_TIME is reached.

addons/redis/dataprotection/common-scripts.sh

addons/redis/dataprotection/pitr-backup.sh

addons/redis/templates/backuppolicytemplate.yaml

addons/redis/dataprotection/common-scripts.sh

addons/redis/templates/backupactionset.yaml

addons/redis/dataprotection/pitr-backup.sh

ldming · 2024-08-27T06:56:00Z

@Y-Rookie could you take a look?

addons/redis/config/redis7-config.tpl

Y-Rookie · 2024-08-27T12:25:10Z

addons/redis/config/redis7-config.tpl

@@ -43,9 +43,10 @@ appendfsync everysec
 no-appendfsync-on-rewrite no
 auto-aof-rewrite-percentage 100
 auto-aof-rewrite-min-size 64mb
+aof-disable-auto-gc no


What is the default value of this parameter? And what is its purpose? I couldn't find it in the official redis.conf file. This issue: redis/redis#10561 mentions that it is just a testing parameter that exists in the internal code.

the default value is no, I mentioned this parameter just to clarify that I used this parameter. I actually set the parameter to yes in the backup target pod only while the users trigger the PITR. As I see, using the testing parameter doesn't affect the existing cluster and the users who don't need PITR.

shanshanying · 2024-08-27T12:59:01Z

Hi @Chiwency ,

after some discussion with my colleges. we suggested:

add new parameters to configConstraint (are they static/dynamic/immutble parameters) to make sure they are updated in the expected way.
do not modify existing parameters' default values. otherwise, other users, who are using redis now, will be affected.
instead, write a brief description on hwo to trigger a PITR. e.g create cluster -> reconfiguration (to update parametes) -> pitr.

Chiwency · 2024-08-27T13:25:36Z

Hi @Chiwency ,

after some discussion with my colleges. we suggested:

add new parameters to configConstraint (are they static/dynamic/immutble parameters) to make sure they are updated in the expected way.

do not modify existing parameters' default values. otherwise, other users, who are using redis now, will be affected.

instead, write a brief description on hwo to trigger a PITR. e.g create cluster -> reconfiguration (to update parametes) -> pitr.

@shanshanying I get it.

And I will revert the aof-timestamp-enabled to default value no
I have a question about point 2, to implement PIRT which is unsupported in Redis, for users who open PITR's continuous backup, the backup script will change the two parameters aof-disable-auto-gc and aof-timestamp-enabled to yes for target redis instance, is OK? It won't affect the users who don's use PITR and existing cluster.
For point 3, the two parameters just take effect in the target one instance, users don't know which instance is selected to backup, so I prefer to set them by script.

shanshanying · 2024-08-28T03:48:18Z

Hi @Chiwency,

I have a question about point 2...

The point is: if the change of parameters will affect exsitng RUNNING redis clusters, we'd better change is in a more EXPLICIT way: tell users how to modify the values mannuly, either by OpsRequest, or edit some configmap, and write a doc to explain the side-effect.
I think we can update parameters through our "Reconfig" OpsRequest. Isn't it?

Chiwency · 2024-08-28T04:29:08Z

The point is: if the change of parameters will affect exsitng RUNNING redis clusters, we'd better change is in a more EXPLICIT way: tell users how to modify the values mannuly, either by OpsRequest, or edit some configmap, and write a doc to explain the side-effect. I think we can update parameters through our "Reconfig" OpsRequest. Isn't it?

Hi @shanshanying
How can I understand the meaning of "affect exsitng RUNNING redis clusters"? As I revert the config file changes, users who don't trigger PITR backup will not be affected, and there will be no any change. But when users apply PITR to a exsiting RUNNING redis clusters, there are two scenarios at this point:

IMPLICT parameter modification
User trigger PITR by only one command, then the script will change the value of aof-timestamp-enabled and aof-disable-auto-gc. These two parameters are only applied to the AOF logs. From the user's perspective, the parameter changes will not be noticeable.

kbcli cluster update <redis-cluster-name> --backup-enabled=true --backup-method=aof

EXPLICIT parameter modification
User trigger PITR by "Reconfig" the aof-timestamp-enabled yes first, then trigger PITR backup.

kbcli cluster edit-config <redis-cluster-name>
kbcli cluster update <redis-cluster-name> --backup-enabled=true --backup-method=aof

Whether it's Option 1 or Option 2, documentation will be provided later to explain the changes. My key question here is whether these parameters should be managed by the users, do these parameter modifications count as affecting the existing cluster?
As I see, "update parameters through our "Reconfig" OpsRequest" may be an additional burden for the users, including understanding and operating. And PITR feature relies on these two parameters, so the two command's in Option 2 actually an atomic command.
What is your opinion on this?

nayutah · 2024-09-03T06:26:03Z

The point is: if the change of parameters will affect exsitng RUNNING redis clusters, we'd better change is in a more EXPLICIT way: tell users how to modify the values mannuly, either by OpsRequest, or edit some configmap, and write a doc to explain the side-effect. I think we can update parameters through our "Reconfig" OpsRequest. Isn't it?

Hi @shanshanying How can I understand the meaning of "affect exsitng RUNNING redis clusters"? As I revert the config file changes, users who don't trigger PITR backup will not be affected, and there will be no any change. But when users apply PITR to a exsiting RUNNING redis clusters, there are two scenarios at this point:

IMPLICT parameter modification
User trigger PITR by only one command, then the script will change the value of aof-timestamp-enabled and aof-disable-auto-gc. These two parameters are only applied to the AOF logs. From the user's perspective, the parameter changes will not be noticeable.
kbcli cluster update <redis-cluster-name> --backup-enabled=true --backup-method=aof
EXPLICIT parameter modification
User trigger PITR by "Reconfig" the aof-timestamp-enabled yes first, then trigger PITR backup.
kbcli cluster edit-config <redis-cluster-name>
kbcli cluster update <redis-cluster-name> --backup-enabled=true --backup-method=aof
Whether it's Option 1 or Option 2, documentation will be provided later to explain the changes. My key question here is whether these parameters should be managed by the users, do these parameter modifications count as affecting the existing cluster? As I see, "update parameters through our "Reconfig" OpsRequest" may be an additional burden for the users, including understanding and operating. And PITR feature relies on these two parameters, so the two command's in Option 2 actually an atomic command. What is your opinion on this?

two independent operations is better for open-source users, first to set the flag, and then use the PITR, if one use PITR with aof timestamp disabled, please return an error code and exits immediately.

nayutah · 2024-09-03T06:33:59Z

addons/redis/dataprotection/pitr-backup.sh

+  DP_save_backup_status_info "${total_size}" "${start_time}" "$(date +%s)"
+}
+
+function check_conf() {


Also check the aof-timestamp-enabled here, it can fail and return an error when both either of them is disabled. Prompt the user to manually edit the config items with 'kbcli edit-config' or 'kubectl + reconfigure ops'

Chiwency · 2024-09-04T08:43:42Z

two independent operations is better for open-source users, first to set the flag, and then use the PITR, if one use PITR with aof timestamp disabled, please return an error code and exits immediately.

Hi @nayutah
As kbcli doesn't offer complex logic validation releate querying the API server, it relies on validation by the manager after generating the opsReq. Therefore, we cannot return error codes immediately in kbcli. Instead, errors are detected through cluster status queries. The same applies to the restore-in-time parameter. The time validation process is placed in manager controller. I think enhancing the validation capabilities of kbcli is a future optimization direction.

feat: support redis(standalone replica) pitr

49975ed

Chiwency requested review from nayutah, ldming, heng4fun, free6om, wangyelei, leon-inf and shanshanying as code owners August 19, 2024 09:30

fix: use current time as the end time

663854e

nayutah reviewed Aug 21, 2024

View reviewed changes

addons/redis/dataprotection/common-scripts.sh Show resolved Hide resolved

addons/redis/dataprotection/pitr-backup.sh Outdated Show resolved Hide resolved

addons/redis/dataprotection/pitr-backup.sh Show resolved Hide resolved

wangyelei reviewed Aug 23, 2024

View reviewed changes

addons/redis/templates/backuppolicytemplate.yaml Outdated Show resolved Hide resolved

addons/redis/dataprotection/common-scripts.sh Show resolved Hide resolved

wangyelei reviewed Aug 23, 2024

View reviewed changes

addons/redis/templates/backupactionset.yaml Show resolved Hide resolved

addons/redis/dataprotection/pitr-backup.sh Show resolved Hide resolved

fix: some corner case bug and resolve cr

5c46ef0

Chiwency force-pushed the redis-pitr branch from 6fe0df9 to 5c46ef0 Compare August 26, 2024 07:53

nayutah approved these changes Aug 26, 2024

View reviewed changes

wangyelei approved these changes Aug 26, 2024

View reviewed changes

ldming reviewed Aug 27, 2024

View reviewed changes

addons/redis/config/redis7-config.tpl Outdated Show resolved Hide resolved

ldming mentioned this pull request Aug 27, 2024

feat: support redis(standalone replica) pitr apecloud/kubeblocks#7998

Merged

Y-Rookie reviewed Aug 27, 2024

View reviewed changes

fix: revert config file change

df2454d

nayutah reviewed Sep 3, 2024

View reviewed changes

chore: reconfigure before pitr start

5cab015

ldming merged commit 9e2368f into apecloud:release-0.9 Sep 10, 2024

Chiwency mentioned this pull request Sep 11, 2024

feat: support redis(standalone replica) pitr #1024

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support redis(standalone replica) pitr #961

feat: support redis(standalone replica) pitr #961

Chiwency commented Aug 19, 2024 •

edited

Loading

ldming commented Aug 27, 2024

Y-Rookie Aug 27, 2024

Chiwency Aug 27, 2024 •

edited

Loading

shanshanying commented Aug 27, 2024

Chiwency commented Aug 27, 2024 •

edited

Loading

shanshanying commented Aug 28, 2024 •

edited

Loading

Chiwency commented Aug 28, 2024

nayutah commented Sep 3, 2024

nayutah Sep 3, 2024

Chiwency commented Sep 4, 2024 •

edited

Loading

feat: support redis(standalone replica) pitr #961

feat: support redis(standalone replica) pitr #961

Conversation

Chiwency commented Aug 19, 2024 • edited Loading

ldming commented Aug 27, 2024

Y-Rookie Aug 27, 2024

Choose a reason for hiding this comment

Chiwency Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

shanshanying commented Aug 27, 2024

Chiwency commented Aug 27, 2024 • edited Loading

shanshanying commented Aug 28, 2024 • edited Loading

Chiwency commented Aug 28, 2024

nayutah commented Sep 3, 2024

nayutah Sep 3, 2024

Choose a reason for hiding this comment

Chiwency commented Sep 4, 2024 • edited Loading

Chiwency commented Aug 19, 2024 •

edited

Loading

Chiwency Aug 27, 2024 •

edited

Loading

Chiwency commented Aug 27, 2024 •

edited

Loading

shanshanying commented Aug 28, 2024 •

edited

Loading

Chiwency commented Sep 4, 2024 •

edited

Loading