-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto recovery in case of locally-missing remote snapshot, other improvements #13
base: master
Are you sure you want to change the base?
Conversation
- Improved consistency: if target machine is "localhost", ssh is not used at all (solved "ssh -n" issue) - Added description of parameters expected in config file, and a reminder to set properties on datasets - `zfs` command location is now auto-discovered - Added sudo fallback, if your machine does not have pfexec
- Added automatic recovery option, in case the latest remote snapshot does not exist locally (added MAXAGE parameter). If enabled, $MAXAGE-1 older remote snapshots are searched in the local ones, to act as the new incremental base. This will prevent manteinance mode, but do know that non-matched snapshots newer than the matching one will be destroyed, before being replaced by the even newer ones. Cloning the non-matched remote snapshots is one (arguably unclean) solution for this, but has not been implemented. - Added 'zfs hold' to local snapshot, to ensure retention before send operation. - Removed redundant code (now $RECENT is used programmatically, instead of branching)
Sorry, noticed there was a bug as i was testing using multiple datasets ( |
- Fix regression in last commit, which executed `zfs hold` for each dataset, but `zfs release` only for the last one. Now it is called right after the send/receive, and then the trap is cleared. - Schedule `release` before even calling the `hold`, to guarantee its execution.
Thank you for your contribution! I've done a quick read-through and seen enough little things here and there that I don't want to merge as-is, and I need to take time to read the recovery logic in-depth. It's a good idea but we need to make sure it is doing what we expect it to. Also, I'm not sure MAXAGE is the right name if it's based on number of snapshots vs. date? Quick notes:
|
The recovery logic is very simple: if latest remote snapshot does not exist locally, then check if the first older remote one does, and retry up to $MAXAGE times (or until all remote snapshots have been checked, if MAXAGE=0). I agree that it's not the best name...maybe something like MAXREMOTERECENT, as it resembles your RECENT variable in some ways? About those issues:
|
This solves a few issues, as well as add a bit more automation to remote-local snapshot matching:
zfs hold
local snapshot before send operation, andzfs release
on program exit, to ensure existence and retention.localhost
, ssh is not used at all (ssh -n
not necessary anymore)sh
instead ofksh
, fallback tosudo
ifpfexec
is not found,zfs
command location is now auto-discoveredThe auto recovery feature basically only changes the way the base of the incremental snapshot is obtained: if the latest remote is missing in the local machine, instead of just going into manteinance mode, zfs-backup tries to find an older snapshot that exists locally, and uses that as the
zfs send -I
base instead. Up to $MAXAGE-1 snapshots older than the latest remote are checked; you can configure MAXAGE in the config file.