- bus3 - buckup to S3
- License
- Contact
- Acknowledgements
- Appendix A; Past improvements
- Appendix B; Performance testing
bus3.py
is an experimental backup tool to S3 storage. It fully utilizes asyncio
to maximize concurrency with small footprint. It relies on aiofiles
, asyncpg
and aioboto3
libraries.
Important notice - bus3 is still under development (experimental) and may or may not work for now.
bus3 is designed to be able to:
- backup files, directories and symbolic/hard links
- preserve extended attributes
- track backup history and file versions
- perform file or chunk (default 64MB) level dedupe
- backup very large files without using up all the memory
- handle a large number of files without using up memory
- maximize cuncurrency with asyncio (coroutines)
- spawn an async task for each file or directory to back up
- spawn an async task for each object write to S3
- support PostgreSQL as opposed to sqlite3 to avoid the global write lock
bus3 splits large files into chunks and stores them as separate objects in S3 storage. It stores file metadata in the database. The database needs to be backed up separately after each backup.
- S3 storage
- Not tested with Amazon AWS S3 (yet)
- Linux
- Developed on Fedora 33 and CentOS 8
- Python 3.8 or later
- bus3.py - the backup tool
- bus3.yaml - config file
- May need root priviledge to execute
- Prepare S3 storage and a dedicated bucket for bus3.py
- Setup python 3.8 or later
- Setup Postgres and create a database named
bus3
- Install aiofiles
- Install aioboto3=8.3.0 (latest 9.0 doesn't work???)
- Install asyncpg
- Install pyyaml
- Edit bus3.yaml for S3 storage endpoint, bucket name and directory to backup
- Setup
~/.aws/credentials
(eg, aws cli) - Run
python bus3.py -b
to backup
https://fedoraproject.org/wiki/PostgreSQL#Installation
-
sudo dnf install postgresql-server
-
sudo vi /var/lib/pgsql/data/pg_hba.conf
host all all 127.0.0.1/32 md5
-
sudo postgresql-setup –initdb
-
sudo systemctl start postgresql
-
sudo su - postgres
-
createdb bus3
-
psql
ALTER USER postgres PASSWORD '';
bus3.yaml is the configuration file.
root_dir: /<path-to-backup-directory>
s3_config:
s3_bucket: <bucket name>
s3_endpoint: https://<S3-storage-URL>:<port>
To back up:
python bus3.py -b
To see backup history/list:
python bus3.py [-l]
Example output:
(bus3) [test@localhost bus3]$ python bus3.py -l
#: date & time backup root directory
0: 2021-06-24 15:31:01 /home/test/py/bus3/test
1: 2021-06-24 15:57:25 /home/test/py/bus3/test
2: 2021-06-24 16:26:53 /home/test/py/bus3/test
3: 2021-06-24 22:34:11 /home/test/py/bus3/test
4: 2021-06-25 07:26:45 /home/test/py/bus3/test
5: 2021-06-25 07:31:05 /home/test/py/bus3/test
6: 2021-06-25 07:41:52 /home/test/py/bus3/test
07:46:42,292 INFO: Completed or gracefully terminated
#
is the backup history number (or scan counter)
To restore directory/file:
python bus3.py -r all|<file/dierctory-to-restore> <directory-to-be-restored> [<backup-history-number>]
<file/directory-to-restore>
can either be specified as a full path (ie, starts with /
) or a relative path to the backup root directory sepcified in the bus3.yaml
. If all
is specified, bus3 will restore all backup files and directories. (Most tests specify all
so far.)
If <backup-history-number>
is not specified, bus3 will restore the latest version.
Important: Please make sure to backup database after each backup files/directories with bus3.py.
bus3.py is under MIT license.
Kyosuke Achiwa - @kyos_achwan - achiwa912+gmail.com (please replace +
with @
)
Project Link: https://github.com/achiwa912/bus3
TBD
improvement | supported | comment |
Switch from sqlite3 to postgres | yes | |
Create DB pool | yes | |
Create S3 client pool | yes | |
Reduce local file reads | no | Performance didn't change but increased memory utilization |
Conducted performance test in a local environment with a locally connected S3 storage (ie, NOT Amazon AWS).
Backed up and restored 1000 4KB random files in a directory.
S3 pool size | max S3 tasks | max DB tasks | backup (files/sec) | restore (files/sec) |
150 | 150 | 96 | 45.2 | 59.9 |
150 | 150 | 150 | 61.1 | 59.1 |
150 | 150 | 256 | 60.9 | 62.7 |
256 | 256 | 256 | 61.8 | 59.5 |
96 | 256 | 256 | 65.8 | 58.3 |
64 | 256 | 256 | 66.9 | 63.0 |
32 | 256 | 256 | 63.9 | 60.0 |
16 | 256 | 256 | 46.5 | 59.4 |
8 | 256 | 256 | 37.9 | 62.7 |
file size (GB) | files | max large buffers | backup (MB/s) | restore (MB/s) |
4 | 2 | 16 | 57.57 | 88.68 |
1 | 1 | 16 | 57.5 | 92.53 |
1 | 2 | 16 | 55.15 | 78.18 |
1 | 4 | 16 | 56.29 | 88.63 |
1 | 8 | 16 | 56.8 | 93.79 |
1 | 8 | 32 | 56.48 | 90.69 |
1 | 16 | 32 | 54.73 | 91.09 |