Releases: mhx/dwarfs
dwarfs-0.7.0
This release took much longer than anticipated, but comes with a rather big surprise (for me, at least): Windows support! I didn't expect this to happen just yet, especially given that I haven't really used Windows over the past two decades. My biggest worries were all the dependencies, but fortunately I came across vcpkg and all of a sudden, porting DwarFS to Windows seemed feasible. So here we are, and all the different tools (mkdwarfs
, dwarfsck
, dwarfsextract
and the FUSE driver dwarfs
) are now working on Windows.
As of this release, in addition to the "classic" statically linked binaries, DwarFS is also available as a universal binary for each platform. The universal binaries bundle the four main tools (mkdwarfs
, dwarfsck
, dwarfsextract
, dwarfs
) in a single, compressed binary that is between 2.5 and 4 MiB in size, a fraction of the size of the standalone binaries. The tools can be accessed either by passing the --tool=<name>
option as the first argument, or, more conveniently, by creating symbolic links to the universal binary using the name of the respective tool.
New Features
-
Windows support. All tools are fully working on Windows, including tfeatures such as hard links, symbolic links, Unicode file names. Thanks to WinFsp, the FUSE driver is also working, albeit with a few quirks (1, 2, 3, 4) compared to the Linux version.
-
Universal binaries that bundle all tools in a single binary. On Windows, the universal binary supports delayed loading of WinFsp DLL. This makes the
mkdwarfs
,dwarfsck
anddwarfsextract
tools usable without the WinFsp DLL. -
Added support for Brotli compression. This is generally much slower at compression than ZSTD or LZMA, but faster than LZMA, while offering a compression ratio better than ZSTD. Fixes github #76.
-
Added
--filter
option to support simple (rsync
-like) filter rules. This resulted from a discussion on github #6. -
Added
--compress-niceness
option tomkdwarfs
. This lowers the priority of the compression worker threads, which has two advantages: a system runningmkdwarfs
will generally be more responsive, and the compression threads won't starve themselves by taking processing power away from the segmenter. -
Added
--stdout-progress
option todwarfsextract
for use with tools such asyad
. Fixes github #117. -
Added
--chmod
option tomkdwarfs
. Fixes github #7. -
Added
--input-list
option to support reading a list of input files from a file or stdin. At least partially fixes github #6. -
Added support for choosing the file hashing algorithm using the
--file-hash
option. This allows you to pick a secure hash instead of the default XXH3 hash. Also fixes github #92. -
Added
--max-similarity-size
option to prevent similarity hashing of huge files. This saves scanning time, especially on slow file systems, while it shouldn't affect compression ratio too much. -
Added
--num-scanner-workers
option. -
Added support for extracting corrupted file systems with
dwarfsextract
. This is enabled using the--continue-on-error
and, if really needed,--disable-integrity-check
options. Fixes github #51. -
Show throughput in the scanning and segmenting phases in
mkdwarfs
. -
Show how much of a file has been consumed in the segmenting phase in
mkdwarfs
. Useful primarily for large files. -
New metadata format (v2.5). The only change is the addition of a "preferred path separator". This is used to correctly interpret symbolic links, as this is the only place where path separators are stored in DwarFS at all.
-
dwarfs
anddwarfsextract
now have options to enable performance monitoring. This can provide insight into the latency of various file system operations. -
Unreadable files are now added as empty files instead of being ignored. Fixes github #40.
-
Honour user locale settings when formatting numbers.
Performance improvements
-
Added a small offset cache to improve random access as well as sequential read latency for large, fragmented files. This gave a 100x higher throughput for a case where DwarFS was used to compress raw file system images. The DwarFS FUSE driver is now capable of achieving read throughput of more than 6 GB/s on a Xeon(R) E-2286M machine.
-
Bypass the block cache for uncompressed blocks. This saves copying block data to memory unnecessarily and allows us to keep all uncompressed blocks accessible directly through the memory mapping. Partially addresses github #139.
-
Improved de-duplication algorithm to only hash files with the same size. File hashing is delayed until at least one more file with the same size is discovered. This happens automatically and should improve scanning speed, especially on slow file systems.
Bugfixes
-
Use
folly::hardware_concurrency()
. Fixes github #130. -
Handle
ARCHIVE_FAILED
status from libarchive, which could be triggered by trying to write long path names to old archive formats (e.g. USTAR, which has a limit of at most 255 characters). -
Properly handle unicode path truncation.
-
Support LZ4 compression levels above 9.
-
Fix heap-use-after-free in
dwarfsextract
due to missingarchive_write_close()
call. -
Fix heap-use-after-free in brotli decompressor due to re-allocation of the decompressed block data.
-
Default FUSE driver debuglevel to
warn
in background mode. Fixes github #113. -
Fixed
extract_block.py
, which was incorrectly usingprintf
instead ofprint
.
Documentation
-
Updated file system format documentation to cover headers and section indices.
-
Documented how to produce bit-identical images.
-
Updated internal operation section of mkdwarfs manpage.
Testing
-
Lots of new tools tests.
-
Removed dependency on
tar
anddiff
binaries, mainly driven by their unavailability on Windows. -
Added GitHub workflow based CI pipeline to avoid regressions and simplify builds.
Other
-
The compression code has been made more modular. This should make it much easier to add support for more compression algorithms in the future.
-
Started using C++20 features.
-
Versioning files are no longer written to the git source tree.
dwarfs-0.7.0-RC6
Features
- Support delayed loading of WinFsp DLL for universal binary. This makes the
mkdwarfs
,dwarfsck
anddwarfsextract
tools of the universal binary usable without the WinFsp DLL.
Performance
- Optimized the offset cache to improve random read latency as well as sequential read latency. This gave a 100x higher throughput for a case where DwarFS was used to compress raw file system images. Fixes github #142.
Bugfixes
- Fixed building with
make
instead ofninja
. Also fix builing inDebug
mode. Fixes github #146. - Fixed
ninja clean
. - Fixed symlink creation for
mount.dwarfs
/mount.dwarfs2
.
Other
- Added CI pipeline.
- Don't write versioning files to source tree.
dwarfs-0.7.0-RC5
Features
- Windows support. All tools can now be built and run on Windows, including the FUSE driver, which makes use of WinFsp. Also fixes github #85.
- Build a "universal" binary that combines
mkdwarfs
,dwarfsck
,dwarfsextract
anddwarfs
in a single binary. This binary can be used either through symbolic links with the proper names of the tool, or by passing--tool=<name>
as the first argument on the command line. - Bypass the block cache for uncompressed blocks. This saves copying block data to memory unnecessarily and allows us to keep all uncompressed blocks accessible directly through the memory mapping. Partially addresses github #139.
- Show throughput in the scanning and segmenting phases in
mkdwarfs
. - Show how much of a file has been consumed in the segmenting phase. Useful primarily for large files.
dwarfs
anddwarfsextract
now have options to enable performance monitoring. This can give insight into the latency of various file system operations.- Added inode offset cache, which improves
read()
latency for very fragmented files.
Bugfixes
- Use
folly::hardware_concurrency()
. Fixes github #130. - Handle
ARCHIVE_FAILED
status from libarchive, which could be triggered by trying to write long path names to old archive formats. - Properly handle unicode path truncation.
Documentation
- Update file system format documentation to cover headers and section indices.
Testing
- Lots of new tools tests.
- Remove dependency on
tar
anddiff
binaries.
Other
- Switch to C++20.
dwarfs-0.7.0-RC4
Features
- Add
--compress-niceness
option tomkdwarfs
.
dwarfs-0.7.0-RC3
Bugfixes
-
Fix heap-use-after-free in dwarfsextract.
-
Fix dwarfs benchmark binary.
Features
- Add
--stdout-progress
option todwarfsextract
. Fixes github #117.
Other
- Reduce amount of test data to speed up compiles and avoid timeouts on travis.
dwarfs-0.7.0-RC2
Bugfixes
-
Fix linking against compression libs. Fixes github #112.
-
Default FUSE driver debuglevel to
warn
in background mode. Fixes github #113.
Features
Documentation
-
Document how to produce bit-identical images
-
Update internal operation section of mkdwarfs manpage
-
Add more documentation details for
--file-hash
option
Other
- Test image reproducibility for path and similarity ordering
dwarfs-0.7.0-RC1
Bugfixes
-
Fixed
extract_block.py
, which was incorrectly usingprintf
instead ofprint
. -
Support LZ4 compression levels above 9.
Features
-
Added
--filter
option to support simple (rsync-like) filter rules. This was driven by a discussion on github #6. -
Added
--input-list
option to support reading a list of input files from a file or stdin. At least partially fixes github #6. -
The compression code has been made more modular. This should make it much easier to add support for more compression algorithms in the future.
-
Added support for Brotli compression. This is generally much slower at compression than ZSTD or LZMA, but faster than LZMA, while offering a compression ratio better than ZSTD. Fixes github #76.
-
Added support for choosing the file hashing algorithm using the
--file-hash
option. This allows you to pick a secure hash instead of the default XXH3. Also fixes github #92. -
Improved de-duplication algorithm to only hash files with the same size. File hashing is delayed until at least one more file with the same size is discovered. This happens automatically and should improve scanning speed, especially on slow file systems.
-
Added
--max-similarity-size
option to prevent similarity hashing of huge files. This saves scanning time, especially on slow file systems, while it shouldn't affect compression ratio too much. -
Honour user locale when formatting numbers.
-
Added
--num-scanner-workers
option. -
Added support for extracting corrupted file systems with
dwarfsextract
. This is enabled using the--continue-on-error
and, if really needed,--disable-integrity-check
options. Fixes github #51.
Other
-
Added unit tests for progress class.
-
Lots of internal cleanups.
dwarfs-0.6.2
Bugfixes
-
Fix #91: image creation reproducibility. Add
--no-create-timestamp
option, produce deterministic inode numbers and fixfsst
bug that causes symbol tables to be non-deterministic. Images built while omitting create timestamps will now be bit-identical. -
Fix #93: only overwrite existing output file when
--force
option given on command line. -
Fix #104: extracting large files was causing
dwarfsextract
to OOM. This was fixed by extracting large files in chunks rather than all at once. -
Fix #105: handle
strrchr()
returnNULL
. -
Fix out-of-bounds access (PR #106).
-
Fix swapped-out cached block detection (PR #107).
-
Fix data race in cached block that was triggered by statistics collection and could cause the process to crash.
-
Fix heap-use-after-free when writing section index.
dwarfs-0.6.1
Bugfixes
-
Fix binary installation. This caused the 0.6.0 binary release to contain test binaries as well as duplicate binaries.
-
The fuse2 driver (
dwarfs2
) was also missing in the 0.6.0 binary release.
dwarfs-0.6.0
Features
-
Add support for cache tidying, which releases cache memory when the mounted file system is unused.
-
Section index support for speeding up mount times (fixes #48).
Bugfixes
-
Fix and simplify static builds as much as possible. Document how to set up a static build environment. This also fixes #75 and #54. Huge shoutout to Maxim Samsonov (@maxirmx) for implementing most of this!
-
Fix #71: driver hangs when unmounting
-
Fix #67: dwarfs I/O hangs if call to to
fuse_reply_iov
fails -
Fix #86: block size bits config issues
-
Various build fixes.