Releases · mhx/dwarfs

11 Jul 19:16

mhx

v0.7.0

3ea7c9e

dwarfs-0.7.0

This release took much longer than anticipated, but comes with a rather big surprise (for me, at least): Windows support! I didn't expect this to happen just yet, especially given that I haven't really used Windows over the past two decades. My biggest worries were all the dependencies, but fortunately I came across vcpkg and all of a sudden, porting DwarFS to Windows seemed feasible. So here we are, and all the different tools (mkdwarfs, dwarfsck, dwarfsextract and the FUSE driver dwarfs) are now working on Windows.

As of this release, in addition to the "classic" statically linked binaries, DwarFS is also available as a universal binary for each platform. The universal binaries bundle the four main tools (mkdwarfs, dwarfsck, dwarfsextract, dwarfs) in a single, compressed binary that is between 2.5 and 4 MiB in size, a fraction of the size of the standalone binaries. The tools can be accessed either by passing the --tool=<name> option as the first argument, or, more conveniently, by creating symbolic links to the universal binary using the name of the respective tool.

New Features

Windows support. All tools are fully working on Windows, including tfeatures such as hard links, symbolic links, Unicode file names. Thanks to WinFsp, the FUSE driver is also working, albeit with a few quirks (1, 2, 3, 4) compared to the Linux version.
Universal binaries that bundle all tools in a single binary. On Windows, the universal binary supports delayed loading of WinFsp DLL. This makes the mkdwarfs, dwarfsck and dwarfsextract tools usable without the WinFsp DLL.
Added support for Brotli compression. This is generally much slower at compression than ZSTD or LZMA, but faster than LZMA, while offering a compression ratio better than ZSTD. Fixes github #76.
Added --filter option to support simple (rsync-like) filter rules. This resulted from a discussion on github #6.
Added --compress-niceness option to mkdwarfs. This lowers the priority of the compression worker threads, which has two advantages: a system running mkdwarfs will generally be more responsive, and the compression threads won't starve themselves by taking processing power away from the segmenter.
Added --stdout-progress option to dwarfsextract for use with tools such as yad. Fixes github #117.
Added --chmod option to mkdwarfs. Fixes github #7.
Added --input-list option to support reading a list of input files from a file or stdin. At least partially fixes github #6.
Added support for choosing the file hashing algorithm using the --file-hash option. This allows you to pick a secure hash instead of the default XXH3 hash. Also fixes github #92.
Added --max-similarity-size option to prevent similarity hashing of huge files. This saves scanning time, especially on slow file systems, while it shouldn't affect compression ratio too much.
Added --num-scanner-workers option.
Added support for extracting corrupted file systems with dwarfsextract. This is enabled using the --continue-on-error and, if really needed, --disable-integrity-check options. Fixes github #51.
Show throughput in the scanning and segmenting phases in mkdwarfs.
Show how much of a file has been consumed in the segmenting phase in mkdwarfs. Useful primarily for large files.
New metadata format (v2.5). The only change is the addition of a "preferred path separator". This is used to correctly interpret symbolic links, as this is the only place where path separators are stored in DwarFS at all.
dwarfs and dwarfsextract now have options to enable performance monitoring. This can provide insight into the latency of various file system operations.
Unreadable files are now added as empty files instead of being ignored. Fixes github #40.
Honour user locale settings when formatting numbers.

Performance improvements

Added a small offset cache to improve random access as well as sequential read latency for large, fragmented files. This gave a 100x higher throughput for a case where DwarFS was used to compress raw file system images. The DwarFS FUSE driver is now capable of achieving read throughput of more than 6 GB/s on a Xeon(R) E-2286M machine.
Bypass the block cache for uncompressed blocks. This saves copying block data to memory unnecessarily and allows us to keep all uncompressed blocks accessible directly through the memory mapping. Partially addresses github #139.
Improved de-duplication algorithm to only hash files with the same size. File hashing is delayed until at least one more file with the same size is discovered. This happens automatically and should improve scanning speed, especially on slow file systems.

Bugfixes

Use folly::hardware_concurrency(). Fixes github #130.
Handle ARCHIVE_FAILED status from libarchive, which could be triggered by trying to write long path names to old archive formats (e.g. USTAR, which has a limit of at most 255 characters).
Properly handle unicode path truncation.
Support LZ4 compression levels above 9.
Fix heap-use-after-free in dwarfsextract due to missing archive_write_close() call.
Fix heap-use-after-free in brotli decompressor due to re-allocation of the decompressed block data.
Default FUSE driver debuglevel to warn in background mode. Fixes github #113.
Fixed extract_block.py, which was incorrectly using printf instead of print.

Documentation

Updated file system format documentation to cover headers and section indices.
Documented how to produce bit-identical images.
Updated internal operation section of mkdwarfs manpage.

Testing

Lots of new tools tests.
Removed dependency on tar and diff binaries, mainly driven by their unavailability on Windows.
Added GitHub workflow based CI pipeline to avoid regressions and simplify builds.

Other

The compression code has been made more modular. This should make it much easier to add support for more compression algorithms in the future.
Started using C++20 features.
Versioning files are no longer written to the git source tree.

Assets 9

09 Jul 19:31

mhx

v0.7.0-RC6

a43629d

dwarfs-0.7.0-RC6 Pre-release

Pre-release

Features

Support delayed loading of WinFsp DLL for universal binary. This makes the mkdwarfs, dwarfsck and dwarfsextract tools of the universal binary usable without the WinFsp DLL.

Performance

Optimized the offset cache to improve random read latency as well as sequential read latency. This gave a 100x higher throughput for a case where DwarFS was used to compress raw file system images. Fixes github #142.

Bugfixes

Fixed building with make instead of ninja. Also fix builing in Debug mode. Fixes github #146.
Fixed ninja clean.
Fixed symlink creation for mount.dwarfs/mount.dwarfs2.

Other

Added CI pipeline.
Don't write versioning files to source tree.

Assets 9

1 Join discussion

04 Jul 13:32

mhx

v0.7.0-RC5

78648e9

dwarfs-0.7.0-RC5 Pre-release

Pre-release

Features

Windows support. All tools can now be built and run on Windows, including the FUSE driver, which makes use of WinFsp. Also fixes github #85.
Build a "universal" binary that combines mkdwarfs, dwarfsck, dwarfsextract and dwarfs in a single binary. This binary can be used either through symbolic links with the proper names of the tool, or by passing --tool=<name> as the first argument on the command line.
Bypass the block cache for uncompressed blocks. This saves copying block data to memory unnecessarily and allows us to keep all uncompressed blocks accessible directly through the memory mapping. Partially addresses github #139.
Show throughput in the scanning and segmenting phases in mkdwarfs.
Show how much of a file has been consumed in the segmenting phase. Useful primarily for large files.
dwarfs and dwarfsextract now have options to enable performance monitoring. This can give insight into the latency of various file system operations.
Added inode offset cache, which improves read() latency for very fragmented files.

Bugfixes

Use folly::hardware_concurrency(). Fixes github #130.
Handle ARCHIVE_FAILED status from libarchive, which could be triggered by trying to write long path names to old archive formats.
Properly handle unicode path truncation.

Documentation

Update file system format documentation to cover headers and section indices.

Testing

Lots of new tools tests.
Remove dependency on tar and diff binaries.

Other

Switch to C++20.

Assets 9

0 Join discussion

24 Dec 16:45

mhx

v0.7.0-RC4

3dfad5a

dwarfs-0.7.0-RC4 Pre-release

Pre-release

Features

Add --compress-niceness option to mkdwarfs.

Assets 4

20 Nov 12:52

mhx

v0.7.0-RC3

4114688

dwarfs-0.7.0-RC3 Pre-release

Pre-release

Bugfixes

Fix heap-use-after-free in dwarfsextract.
Fix dwarfs benchmark binary.

Features

Add --stdout-progress option to dwarfsextract. Fixes github #117.

Other

Reduce amount of test data to speed up compiles and avoid timeouts on travis.

Assets 4

0 Join discussion

17 Nov 12:39

mhx

v0.7.0-RC2

bd09004

dwarfs-0.7.0-RC2 Pre-release

Pre-release

Bugfixes

Fix linking against compression libs. Fixes github #112.
Default FUSE driver debuglevel to warn in background mode. Fixes github #113.

Features

Add --chmod option. Fixes github #7.
Add unreadable files as empty files. Fixes github #40.

Documentation

Document how to produce bit-identical images
Update internal operation section of mkdwarfs manpage
Add more documentation details for --file-hash option

Other

Test image reproducibility for path and similarity ordering

Assets 4

0 Join discussion

08 Nov 13:12

mhx

v0.7.0-RC1

c743feb

dwarfs-0.7.0-RC1 Pre-release

Pre-release

Bugfixes

Fixed extract_block.py, which was incorrectly using printf instead of print.
Support LZ4 compression levels above 9.

Features

Added --filter option to support simple (rsync-like) filter rules. This was driven by a discussion on github #6.
Added --input-list option to support reading a list of input files from a file or stdin. At least partially fixes github #6.
The compression code has been made more modular. This should make it much easier to add support for more compression algorithms in the future.
Added support for Brotli compression. This is generally much slower at compression than ZSTD or LZMA, but faster than LZMA, while offering a compression ratio better than ZSTD. Fixes github #76.
Added support for choosing the file hashing algorithm using the --file-hash option. This allows you to pick a secure hash instead of the default XXH3. Also fixes github #92.
Improved de-duplication algorithm to only hash files with the same size. File hashing is delayed until at least one more file with the same size is discovered. This happens automatically and should improve scanning speed, especially on slow file systems.
Added --max-similarity-size option to prevent similarity hashing of huge files. This saves scanning time, especially on slow file systems, while it shouldn't affect compression ratio too much.
Honour user locale when formatting numbers.
Added --num-scanner-workers option.
Added support for extracting corrupted file systems with dwarfsextract. This is enabled using the --continue-on-error and, if really needed, --disable-integrity-check options. Fixes github #51.

Other

Added unit tests for progress class.
Lots of internal cleanups.

Assets 4

0 Join discussion

24 Oct 12:14

mhx

v0.6.2

0b060a5

dwarfs-0.6.2

Bugfixes

Fix #91: image creation reproducibility. Add --no-create-timestamp option, produce deterministic inode numbers and fix fsst bug that causes symbol tables to be non-deterministic. Images built while omitting create timestamps will now be bit-identical.
Fix #93: only overwrite existing output file when --force option given on command line.
Fix #104: extracting large files was causing dwarfsextract to OOM. This was fixed by extracting large files in chunks rather than all at once.
Fix #105: handle strrchr() return NULL.
Fix out-of-bounds access (PR #106).
Fix swapped-out cached block detection (PR #107).
Fix data race in cached block that was triggered by statistics collection and could cause the process to crash.
Fix heap-use-after-free when writing section index.

Assets 6

11 Jun 20:55

mhx

v0.6.1

b1e4667

dwarfs-0.6.1

Bugfixes

Fix binary installation. This caused the 0.6.0 binary release to contain test binaries as well as duplicate binaries.
The fuse2 driver (dwarfs2) was also missing in the 0.6.0 binary release.

Assets 6

11 Jun 20:15

mhx

v0.6.0

76b6ffd

dwarfs-0.6.0

Features

Add support for cache tidying, which releases cache memory when the mounted file system is unused.
Section index support for speeding up mount times (fixes #48).

Bugfixes

Fix and simplify static builds as much as possible. Document how to set up a static build environment. This also fixes #75 and #54. Huge shoutout to Maxim Samsonov (@maxirmx) for implementing most of this!
Fix #71: driver hangs when unmounting
Fix #67: dwarfs I/O hangs if call to to fuse_reply_iov fails
Fix #86: block size bits config issues
Various build fixes.

Contributors

maxirmx

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Features

Performance improvements

Bugfixes

Documentation

Testing

Other

Features

Performance

Bugfixes

Other

Features

Bugfixes

Documentation

Testing

Other

Features

Bugfixes

Features

Other

Bugfixes

Features

Documentation

Other

Bugfixes

Features

Other

Bugfixes

Bugfixes

Features

Bugfixes

Contributors

Releases: mhx/dwarfs

dwarfs-0.7.0

New Features

Performance improvements

Bugfixes

Documentation

Testing

Other

dwarfs-0.7.0-RC6

Features

Performance

Bugfixes

Other

dwarfs-0.7.0-RC5

Features

Bugfixes

Documentation

Testing

Other

dwarfs-0.7.0-RC4

Features

dwarfs-0.7.0-RC3

Bugfixes

Features

Other

dwarfs-0.7.0-RC2

Bugfixes

Features

Documentation

Other

dwarfs-0.7.0-RC1

Bugfixes

Features

Other

dwarfs-0.6.2

Bugfixes

dwarfs-0.6.1

Bugfixes

dwarfs-0.6.0

Features

Bugfixes

Contributors