Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: wrong regex for wal retention #1026

Merged
merged 6 commits into from
Dec 6, 2024
Merged

Conversation

nickmansrob
Copy link
Contributor

@nickmansrob nickmansrob commented Sep 16, 2024

Fixes #1015

Credits to @andrewfung729 for providing the fix.

Created a PR out of this because I also encountered this issue and needed a fix relatively quickly.

For the maintainers, could you give an estimate on when spilo gets released and when the postgres operator is also bumped? If this could take a while, I may create my own monkey patch until this is released.

Thanks!

@nickmansrob
Copy link
Contributor Author

I also was wondering why the script does not use following command: wal-g delete retain FULL $DAYS_TO_RETAIN
This is probably an easier way instead of calculating the $BEFORE variable

Copy link

@emrahbecer emrahbecer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version of wal-g uses the header name "backup_name" instead of "name". The regex script should be modified to adapt to this

@bo0ts
Copy link

bo0ts commented Oct 21, 2024

Isn't this going to break wal-e because it still prints name in version 1.1.1?

@nickmansrob
Copy link
Contributor Author

nickmansrob commented Oct 26, 2024

@bo0ts @emrah83 I updated my regex to be backwards compatible with wal-e. The regex script already has been updated, I think, given that you are referring to this script.

@fgalind1
Copy link

fgalind1 commented Nov 9, 2024

any chance this would get approved and merged, we're facing also the same issue and tested that this fixes the issue

@Yingrjimsch
Copy link

If this is already touched, can we discuss if there is an option to have BACKUP_NUM_TO_RETAIN really be the number of backups and not the DAYS_TO_RETAIN this is confusing if I have multiple backups a day and then there are still more backups available then set by me via BACKUP_NUM_TO_RETAIN

@nickmansrob
Copy link
Contributor Author

@Yingrjimsch I'd advise you to create a separate issue for this :)

@nickmansrob
Copy link
Contributor Author

I'd like to see this merged asap, because I think it will take some time to get the version of spilo bumped on the operator, might want to use a custom build for the time being.

Copy link

@emrahbecer emrahbecer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
This change has better be merged asap because we cannot tolerate backup files to accumulate infinitely.

Copy link

@MatzHilven MatzHilven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@OlleLarsson
Copy link
Contributor

Hi, would it be possible for either of you to take a look at this PR @hughcapet @FxKu 😃 ?

@FxKu FxKu added the bugfix Bug fixes and patches, e.g. fixing of a production issue that is affecting the customer experience. label Nov 19, 2024
@emrahbecer
Copy link

Great. I think we now have to wait for the next tag or release for the new image. Right?

@Yingrjimsch
Copy link

Yingrjimsch commented Nov 21, 2024

For anyone who needs a temporary fix. I have written a small script which loops over the databases in your cluster and edits the crontab/postgres file so the wal retention regex is correct.

In my case all databases are identifiable by the substring database, feel free to change that for your usecase 😄

⚠️ This TEMPORARY fix is removed as soon as the db pod restarts.

#!/bin/bash

# Find all pods containing the substring "database" with the label spilo-role=master
pods=$(kubectl get pods --all-namespaces -l spilo-role=master -o jsonpath='{range .items[?(@.metadata.name contains "database")]}{.metadata.namespace} {.metadata.name}{"\n"}{end}')

# Check if any pods were found
if [ -z "$pods" ]; then
  echo -e "${RED}No matching pods found.${NC}"
  exit 1
fi

# Loop through each found pod
echo "$pods" | while read -r namespace pod; do

  # Adding some spacing between pods
  echo -e "\n\n==========================="
  echo "Processing pod: $pod in namespace: $namespace"
  echo "==========================="

  # Execute pg_dump inside the pod and store the output in a file inside the pod's /tmp directory
  echo "Fixing postgres_backup for database: $db_name in pod: $pod..."
  kubectl --kubeconfig="$KUBECONFIG_PATH" exec -n "$namespace" "$pod" -- bash -c "sed -i 's|/scripts/postgres_backup.sh \"|backup -s \"|' /var/spool/cron/crontabs/postgres && sed -i 's|envdir|cat /scripts/postgres_backup.sh \| sed '\''s/\\^name/\\^\\\\(backup_\\\\)\\\\?name/g'\'' \| envdir|' /var/spool/cron/crontabs/postgres"
done

@fgalind1
Copy link

any updates on this PR to get it merged that fixes the current branch? 👍

@FxKu
Copy link
Member

FxKu commented Nov 27, 2024

Next week, I hope. End of november we always have to follow a stricter merge and deployment policy.

@hughcapet
Copy link
Member

👍

1 similar comment
@idanovinda
Copy link
Member

👍

@hughcapet hughcapet merged commit 39a6c8f into zalando:master Dec 6, 2024
4 checks passed
hughcapet pushed a commit that referenced this pull request Dec 6, 2024
hughcapet added a commit that referenced this pull request Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Bug fixes and patches, e.g. fixing of a production issue that is affecting the customer experience.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wal-g backup retention feature not working
10 participants