Skip to content

Releases: mhx/dwarfs

dwarfs-0.7.0

11 Jul 19:16
@mhx mhx
3ea7c9e
Compare
Choose a tag to compare

This release took much longer than anticipated, but comes with a rather big surprise (for me, at least): Windows support! I didn't expect this to happen just yet, especially given that I haven't really used Windows over the past two decades. My biggest worries were all the dependencies, but fortunately I came across vcpkg and all of a sudden, porting DwarFS to Windows seemed feasible. So here we are, and all the different tools (mkdwarfs, dwarfsck, dwarfsextract and the FUSE driver dwarfs) are now working on Windows.

As of this release, in addition to the "classic" statically linked binaries, DwarFS is also available as a universal binary for each platform. The universal binaries bundle the four main tools (mkdwarfs, dwarfsck, dwarfsextract, dwarfs) in a single, compressed binary that is between 2.5 and 4 MiB in size, a fraction of the size of the standalone binaries. The tools can be accessed either by passing the --tool=<name> option as the first argument, or, more conveniently, by creating symbolic links to the universal binary using the name of the respective tool.

New Features

  • Windows support. All tools are fully working on Windows, including tfeatures such as hard links, symbolic links, Unicode file names. Thanks to WinFsp, the FUSE driver is also working, albeit with a few quirks (1, 2, 3, 4) compared to the Linux version.

  • Universal binaries that bundle all tools in a single binary. On Windows, the universal binary supports delayed loading of WinFsp DLL. This makes the mkdwarfs, dwarfsck and dwarfsextract tools usable without the WinFsp DLL.

  • Added support for Brotli compression. This is generally much slower at compression than ZSTD or LZMA, but faster than LZMA, while offering a compression ratio better than ZSTD. Fixes github #76.

  • Added --filter option to support simple (rsync-like) filter rules. This resulted from a discussion on github #6.

  • Added --compress-niceness option to mkdwarfs. This lowers the priority of the compression worker threads, which has two advantages: a system running mkdwarfs will generally be more responsive, and the compression threads won't starve themselves by taking processing power away from the segmenter.

  • Added --stdout-progress option to dwarfsextract for use with tools such as yad. Fixes github #117.

  • Added --chmod option to mkdwarfs. Fixes github #7.

  • Added --input-list option to support reading a list of input files from a file or stdin. At least partially fixes github #6.

  • Added support for choosing the file hashing algorithm using the --file-hash option. This allows you to pick a secure hash instead of the default XXH3 hash. Also fixes github #92.

  • Added --max-similarity-size option to prevent similarity hashing of huge files. This saves scanning time, especially on slow file systems, while it shouldn't affect compression ratio too much.

  • Added --num-scanner-workers option.

  • Added support for extracting corrupted file systems with dwarfsextract. This is enabled using the --continue-on-error and, if really needed, --disable-integrity-check options. Fixes github #51.

  • Show throughput in the scanning and segmenting phases in mkdwarfs.

  • Show how much of a file has been consumed in the segmenting phase in mkdwarfs. Useful primarily for large files.

  • New metadata format (v2.5). The only change is the addition of a "preferred path separator". This is used to correctly interpret symbolic links, as this is the only place where path separators are stored in DwarFS at all.

  • dwarfs and dwarfsextract now have options to enable performance monitoring. This can provide insight into the latency of various file system operations.

  • Unreadable files are now added as empty files instead of being ignored. Fixes github #40.

  • Honour user locale settings when formatting numbers.

Performance improvements

  • Added a small offset cache to improve random access as well as sequential read latency for large, fragmented files. This gave a 100x higher throughput for a case where DwarFS was used to compress raw file system images. The DwarFS FUSE driver is now capable of achieving read throughput of more than 6 GB/s on a Xeon(R) E-2286M machine.

  • Bypass the block cache for uncompressed blocks. This saves copying block data to memory unnecessarily and allows us to keep all uncompressed blocks accessible directly through the memory mapping. Partially addresses github #139.

  • Improved de-duplication algorithm to only hash files with the same size. File hashing is delayed until at least one more file with the same size is discovered. This happens automatically and should improve scanning speed, especially on slow file systems.

Bugfixes

  • Use folly::hardware_concurrency(). Fixes github #130.

  • Handle ARCHIVE_FAILED status from libarchive, which could be triggered by trying to write long path names to old archive formats (e.g. USTAR, which has a limit of at most 255 characters).

  • Properly handle unicode path truncation.

  • Support LZ4 compression levels above 9.

  • Fix heap-use-after-free in dwarfsextract due to missing archive_write_close() call.

  • Fix heap-use-after-free in brotli decompressor due to re-allocation of the decompressed block data.

  • Default FUSE driver debuglevel to warn in background mode. Fixes github #113.

  • Fixed extract_block.py, which was incorrectly using printf instead of print.

Documentation

Testing

  • Lots of new tools tests.

  • Removed dependency on tar and diff binaries, mainly driven by their unavailability on Windows.

  • Added GitHub workflow based CI pipeline to avoid regressions and simplify builds.

Other

  • The compression code has been made more modular. This should make it much easier to add support for more compression algorithms in the future.

  • Started using C++20 features.

  • Versioning files are no longer written to the git source tree.

dwarfs-0.7.0-RC6

09 Jul 19:31
@mhx mhx
Compare
Choose a tag to compare
dwarfs-0.7.0-RC6 Pre-release
Pre-release

Features

  • Support delayed loading of WinFsp DLL for universal binary. This makes the mkdwarfs, dwarfsck and dwarfsextract tools of the universal binary usable without the WinFsp DLL.

Performance

  • Optimized the offset cache to improve random read latency as well as sequential read latency. This gave a 100x higher throughput for a case where DwarFS was used to compress raw file system images. Fixes github #142.

Bugfixes

  • Fixed building with make instead of ninja. Also fix builing in Debug mode. Fixes github #146.
  • Fixed ninja clean.
  • Fixed symlink creation for mount.dwarfs/mount.dwarfs2.

Other

  • Added CI pipeline.
  • Don't write versioning files to source tree.

dwarfs-0.7.0-RC5

04 Jul 13:32
@mhx mhx
Compare
Choose a tag to compare
dwarfs-0.7.0-RC5 Pre-release
Pre-release

Features

  • Windows support. All tools can now be built and run on Windows, including the FUSE driver, which makes use of WinFsp. Also fixes github #85.
  • Build a "universal" binary that combines mkdwarfs, dwarfsck, dwarfsextract and dwarfs in a single binary. This binary can be used either through symbolic links with the proper names of the tool, or by passing --tool=<name> as the first argument on the command line.
  • Bypass the block cache for uncompressed blocks. This saves copying block data to memory unnecessarily and allows us to keep all uncompressed blocks accessible directly through the memory mapping. Partially addresses github #139.
  • Show throughput in the scanning and segmenting phases in mkdwarfs.
  • Show how much of a file has been consumed in the segmenting phase. Useful primarily for large files.
  • dwarfs and dwarfsextract now have options to enable performance monitoring. This can give insight into the latency of various file system operations.
  • Added inode offset cache, which improves read() latency for very fragmented files.

Bugfixes

  • Use folly::hardware_concurrency(). Fixes github #130.
  • Handle ARCHIVE_FAILED status from libarchive, which could be triggered by trying to write long path names to old archive formats.
  • Properly handle unicode path truncation.

Documentation

  • Update file system format documentation to cover headers and section indices.

Testing

  • Lots of new tools tests.
  • Remove dependency on tar and diff binaries.

Other

  • Switch to C++20.

dwarfs-0.7.0-RC4

24 Dec 16:45
@mhx mhx
Compare
Choose a tag to compare
dwarfs-0.7.0-RC4 Pre-release
Pre-release

Features

  • Add --compress-niceness option to mkdwarfs.

dwarfs-0.7.0-RC3

20 Nov 12:52
@mhx mhx
Compare
Choose a tag to compare
dwarfs-0.7.0-RC3 Pre-release
Pre-release

Bugfixes

  • Fix heap-use-after-free in dwarfsextract.

  • Fix dwarfs benchmark binary.

Features

  • Add --stdout-progress option to dwarfsextract. Fixes github #117.

Other

  • Reduce amount of test data to speed up compiles and avoid timeouts on travis.

dwarfs-0.7.0-RC2

17 Nov 12:39
@mhx mhx
Compare
Choose a tag to compare
dwarfs-0.7.0-RC2 Pre-release
Pre-release

Bugfixes

  • Fix linking against compression libs. Fixes github #112.

  • Default FUSE driver debuglevel to warn in background mode. Fixes github #113.

Features

  • Add --chmod option. Fixes github #7.

  • Add unreadable files as empty files. Fixes github #40.

Documentation

  • Document how to produce bit-identical images

  • Update internal operation section of mkdwarfs manpage

  • Add more documentation details for --file-hash option

Other

  • Test image reproducibility for path and similarity ordering

dwarfs-0.7.0-RC1

08 Nov 13:12
@mhx mhx
Compare
Choose a tag to compare
dwarfs-0.7.0-RC1 Pre-release
Pre-release

Bugfixes

  • Fixed extract_block.py, which was incorrectly using printf instead of print.

  • Support LZ4 compression levels above 9.

Features

  • Added --filter option to support simple (rsync-like) filter rules. This was driven by a discussion on github #6.

  • Added --input-list option to support reading a list of input files from a file or stdin. At least partially fixes github #6.

  • The compression code has been made more modular. This should make it much easier to add support for more compression algorithms in the future.

  • Added support for Brotli compression. This is generally much slower at compression than ZSTD or LZMA, but faster than LZMA, while offering a compression ratio better than ZSTD. Fixes github #76.

  • Added support for choosing the file hashing algorithm using the --file-hash option. This allows you to pick a secure hash instead of the default XXH3. Also fixes github #92.

  • Improved de-duplication algorithm to only hash files with the same size. File hashing is delayed until at least one more file with the same size is discovered. This happens automatically and should improve scanning speed, especially on slow file systems.

  • Added --max-similarity-size option to prevent similarity hashing of huge files. This saves scanning time, especially on slow file systems, while it shouldn't affect compression ratio too much.

  • Honour user locale when formatting numbers.

  • Added --num-scanner-workers option.

  • Added support for extracting corrupted file systems with dwarfsextract. This is enabled using the --continue-on-error and, if really needed, --disable-integrity-check options. Fixes github #51.

Other

  • Added unit tests for progress class.

  • Lots of internal cleanups.

dwarfs-0.6.2

24 Oct 12:14
@mhx mhx
Compare
Choose a tag to compare

Bugfixes

  • Fix #91: image creation reproducibility. Add --no-create-timestamp option, produce deterministic inode numbers and fix fsst bug that causes symbol tables to be non-deterministic. Images built while omitting create timestamps will now be bit-identical.

  • Fix #93: only overwrite existing output file when --force option given on command line.

  • Fix #104: extracting large files was causing dwarfsextract to OOM. This was fixed by extracting large files in chunks rather than all at once.

  • Fix #105: handle strrchr() return NULL.

  • Fix out-of-bounds access (PR #106).

  • Fix swapped-out cached block detection (PR #107).

  • Fix data race in cached block that was triggered by statistics collection and could cause the process to crash.

  • Fix heap-use-after-free when writing section index.

dwarfs-0.6.1

11 Jun 20:55
@mhx mhx
Compare
Choose a tag to compare

Bugfixes

  • Fix binary installation. This caused the 0.6.0 binary release to contain test binaries as well as duplicate binaries.

  • The fuse2 driver (dwarfs2) was also missing in the 0.6.0 binary release.

dwarfs-0.6.0

11 Jun 20:15
@mhx mhx
Compare
Choose a tag to compare

Features

  • Add support for cache tidying, which releases cache memory when the mounted file system is unused.

  • Section index support for speeding up mount times (fixes #48).

Bugfixes

  • Fix and simplify static builds as much as possible. Document how to set up a static build environment. This also fixes #75 and #54. Huge shoutout to Maxim Samsonov (@maxirmx) for implementing most of this!

  • Fix #71: driver hangs when unmounting

  • Fix #67: dwarfs I/O hangs if call to to fuse_reply_iov fails

  • Fix #86: block size bits config issues

  • Various build fixes.