A rudimentary system for backing up files.
THIS BACKUP SYSTEM IS NOT WELL TESTED AND YOU SHOULD NOT USE IT.
-
An
ISource
is a source of files to be backed up. The onlyISource
currently implemented is theFileSystemSource
which will read files from the file system, everything under a base path (unless filtered out). -
An
IDestination
is a store where largish chunks of data can be stored. A destination can optionally support partial reads by implementingIDestinationWithPartialRead
. There are a few implemented:FileSystemDestination
dumps data into a tree on a normal file system. This implementsIDestinationWithPartialRead
.S3Destination
writes files to an S3 bucket with a prefix. This implementsIDestinationWithPartialRead
.PackedDestination
wraps anotherIDestination
, but packs data into large chunks.SplitDestination
wraps a primaryIDestinationWithPartialRead
and several secondaryIDestination
s, and sends each chunk to each of them in turn. Note: this will load the chunk into memory, so be careful if wrapping aSplitDestination
with aPackedDestination
, asPackedDestination
will generate very large chunks. Reads only come from the PrimaryIDestination
. The primary destination must implement partial reads (IDestinationWithPartialRead
). This implementsIDestinationWithPartialRead
.
If you use FileSystemDestniation
or S3Destination
directly, you will have at least one file in the destination for every unique file in your source, with PackedDestination
, files will be packed into large files and then flushed to the wrapped destination, making for significantly fewer files (but is slower and must write the packed file to a temporary file before flushing to the final destination).
- The backup processor chunks files up and hashes them, eliminating duplicate files (and potentially) duplicate chunks of data. It also gzip compresses files that meet certain criteria, hopefully reducing the size of your backup archive.
- As an append-only archive, it's not possible to remove old files, so your collection of Scooby-Doo and Scrappy-Doo fan fiction will be safe forever despite your regrets.
- Supports client-side AES encryption.
- Can backup, restore, and verify files. Can also run a report of all files and versions within the archive.
- In my testing, it seems reasonably fast and uses a reasonable amount of memory, though more extensive testing on larger backup archives is needed.
- Makes no attempt at preventing you doing dumb things like backing up stuff with an encryption key and then losing the key.
- Does not use VSS or other techniques to read files that are in use. If a file is locked, it'll just have to skip it.
- Has no scheduling. Must be run from the command line.
- Has no GUI. Must be run from the command line.
- Will probably do terrible things if you run multiple instances at once.
The most likely use-case for this would be to run as a scheduled job in the middle of the night on files that are unlikely to be locked by the OS, in conjunction with another backup system that is proven to work. Even then, NO WARRANTY, use at your own risk.
Backups are based on Backup Sets. A Backup Set is JSON file that declares a Source of files (or whatever) to backup, and a Destination to put the backups. There are a couple Sources to chose from, but I'd recommend using the (poorly named) GroupedSource
, for reasons described below.
To build a custom source, implement ISource
and update SourceFactory
. The following are built-in:
Recursively lists all files from a single starting directory. Configuration consists of a single parameter, basePath
which is the base path from which to recursively list files.
Example config:
"source": {
"type": "FileSystemSource",
"config": {
"basePath": "C:\\ImportantData"
}
}
A collection of named sources. Generally prefer this over a single source, as you can add more sources without having to recreate your entire backup. A grouped source takes a list of NamedSource
objects.
Example config:
"source": {
"type": "GroupedSource",
"config": {
"sources": [
{
"name": "Data",
"source": {
"type": "FileSystemSource",
"config": {
"basePath": "C:\\Data"
}
}
},
{
"name": "Desktop",
"source": {
"type": "FileSystemSource",
"config": {
"basePath": "C:\\Users\\Me\\Desktop"
}
}
}
]
}
}
To build a custom destination, implement either IDestinationWithPartialRead
or IDestination
. If your provider can implement IDestinationWithPartialRead
, it should. Destinations with partial read support are supported by PackedDestination
, described below.
Persists file chunks to the file system directly as files. Simplest destination, but depending on the underlying file system, may cause a very large number of files and directories to be created. Is configured with a base path.
Example config:
"destination": {
"type": "FileSystemDestination",
"config": {
"basePath": "E:\\Backup"
}
}
Wraps another destination, but packs the blobs into larger files before persisting to the wrapped destination. This is useful when make a very large backup set and you don't wish to have too many files written to the destination. It's configured with a destination to wrap. It will pack files until a "progress" event is sent, so it may be important to configure progressEveryCount
and/or progressEveryBytes
(see below), otherwise a backup run may be bundled into an enormous single file before being flushed to the destination.
Example config:
"destination": {
"type": "PackedDestination",
"config": {
"destination": {
"type": "FileSystemDestination",
"config": {
"basePath": "E:\\Backup"
}
}
}
}
Wraps a primary destination and several other secondary destinations, sending all chunks to each destination. Useful for backing up in more than one spot at once. The primary destination must support partial reads (IDestinationWithPartialRead
). Read operations will only read from the primary destination. Note that by default this will load the chunk into memory before distributing, so very large chunks may cause memory issues. To cache to a temp file on disk instead, set cacheOnDisk
to true in the config.
Example config:
"destination": {
"type": "SplitDestination",
"config": {
"cacheOnDisk": true,
"verbose": true,
"primaryDestination": {
"type": "FileSystemDestination",
"config": {
"basePath": "E:\\Backup"
}
},
"secondaryDestinations" [
{
"type": "FileSystemDestination",
"config": {
"basePath": "X:\\SecondaryBackup"
}
},
{
"type": "FileSystemDestination",
"config": {
"basePath": "Z:\\FinalDestination"
}
}
]
}
}
Writes files to an S3 bucket, with prefix. It takes bucket, prefix, access key, and secret key as configuration. This one is not well tested.
Example config:
"destination": {
"type": "S3Destination",
"config": {
"bucket": "my_backup_s3_bucket",
"prefix": "backup",
"accessKey": "XXX",
"secretKey": "XXX"
}
}
There are a few other options you can apply in the JSON config file:
skipFilesContaining
- Skip any file containing any of the strings in this array. Example:[ "node_modules", ".git", ".svn" ]
. These files will not be backed up.skipCompressionContaining
- Do not compress any file containing any of the strings in this array. Example:[ '.jpg', '.png', '.etc' ]
progressEveryCount
/progressEveryBytes
- A snapshot of the current backup process can be sent either every so many files processed (withprogressEveryCount
) or every so many bytes processed (withprogressEveryBytes
). This allows a backup process to resume from the last progress sent if it gets interruptted. Note that it will send on the first type hit (count or bytes), and reset both counters.
{
"source": {
"type": "GroupedSource",
"config": {
"sources": [
{
"name": "Data",
"source": {
"type": "FileSystemSource",
"config": {
"basePath": "C:\\Data"
}
}
},
{
"name": "Desktop",
"source": {
"type": "FileSystemSource",
"config": {
"basePath": "C:\\Users\\Me\\Desktop"
}
}
}
]
}
},
"destination": {
"type": "PackedDestination",
"config": {
"destination": {
"type": "FileSystemDestination",
"config": {
"basePath": "E:\\Backup"
}
}
}
},
"skipFilesContaining": [ "node_modules", ".git", ".svn", "thumbs.db", ".etc" ],
"skipCompressionContaining": [ ".mp2", ".mp3", ".mp4", ".etc" ]
}
In general, you execute commands through the dontnet run time:
dotnet PageOfBob.Backup.App.dll commandname --args
The commands are as follows:
Backs up files from a source to a destination. Requires a set JSON file.
-s|--set <set>
- Path to the set JSON file.-p|--progress <progress>
- Save a "progress" file every<progress>
files. (Note: this will probably be moved to the set file).-k|--key <key>
- Encryption key, if using encryption.
Restore files from a backup to a source. Requires a set.
-s|--set <set>
- Path to the set JSON file.-p|--prefix <prefix>
- Prefix to match for restoring files, or path of a file to restore.-k|--key <key>
- Decryption key, if using encryption.-e|--entry <entry>
- Backup Entry key. Each time a backup is run it creates an entry; when restoring you can choose a specific run (entry) to use to get a file from a point in time.-v|--verify
- Verify only. Does not actually restore any files, but will do (slow) comparisons between the backup and source files and report any discrepancies.-f|--force
- Force. Restoring backed up files will not overwrite existing files by default. This argument will overwrite files.
Reports on the contents of the backup. This is useful for searching for files and versions of files. Requires a set.
-s|--set <set>
- Path to the set JSON file.-p|--prefix <prefix>
- Will filter reported files to only those matching the prefix.-k|--key <key>
- Decryption key if using encryption.-e|--entry <entry>
- Only report from a particular backup entry. Each time a backup is run, an entry is created. If not provided, all entries are listed.-o|--out <filename>
- Write report to a file. Otherwise, report is dumped tostdout
.-h|--subhashes
- Files are frequently broken down into smaller chunks (called subhashes). This option will list all of the subhashes associated to a file.-i|--includeDupes
- Usually, files will be listed in the report the first time they appear; with this option, all instances of a file are reported.
Generates a key that can be used for encryption. Takes no arguments and does not save the key, just reports it to stdout
.
- Better heuristics for which files to compress
- More configurable filtering options
- Test, test, test