Rework windows "Physical size" calculation, added Unit Tests #9

Mart-Bogdan · 2022-11-19T13:17:30Z

Added:

Unit tests
Github workflow to run:
- Tests
- clippy
- code formatting checks
Changed implementation on windows. Fixes Deal with max path size on Windows #5.

Current implementation provides more correct size for files of size around 2k
and would show it as 4k physical size. CompressedFileSize would report smaller size.

But it still have problems of being inconsistent with Windows explorer for files smaller than 512 bytes.
Such small files are actually stored inside directory, not in own clusters, and explorer reports them as 0.

Actual threshold is not 512, it's dynamic, for now I don't know how to detect it.

Also this algorithm ignores alternate data streams. Same for previous implementation.
To be more precise -- alternate data streams can also be compressed, as I learned recently.

Main difference: we are using syscall that uses file handle (that we already have opened) instead of getting by file path.
And getting file path for long path required reformatting path into special format. Rust's std File is performing this for us under the hood. I first tried to fix this, and format string with \\.\ prefix (for network path UNC prefix) to enable long path, but that's big tricky function that have to be maintained, and using GetFileInformationByHandleEx gives us more flexibility for further improvements.

* Manually implemented windows File Namespace name generation * added some unit tests closes scullionw#5

see: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

Current implementation provides more correct size for files of size around 2k and would show it as 4k physical size. CompressedFileSize would report smaller size. But it still have problems of being inconsitent with Windows explorer for files smaller than 512 bytes. Such small files are actually stored inside directory, not in own clusters, and explorer reports them as 0. Actual threshold is not 512, it's dynamic, for now I don't know how to detect it. Also this algorithm ignores alternate data streams. Same for previous implementation. To be more precise -- alternate data streams can also be compressed, as I learned today.

…s for windows Use COMPRESSION_FORMAT_LZNT1 instead of COMPRESSION_FORMAT_DEFAULT, perhaps it would help on WinServer

…version as it was accidentally increased twice, skiping one version number.

Mart-Bogdan · 2022-11-19T14:04:22Z

It has lots of clippy warnings, but I think they should be resolved by separate PR. And then we can even reject PRs that don't stratify clippy.

Mart-Bogdan · 2022-11-19T20:44:14Z

I have written simple benchmark, and it shows that this solution is even faster:

https://github.com/Mart-Bogdan-TMP/dirstat-rs-benchmark/actions/runs/3504907900/jobs/5870888010

physical size/new       time:   [341.84 ms 348.53 ms 356.52 ms]
physical size/old       time:   [595.32 ms 603.11 ms 611.31 ms]
logical size/new        time:   [330.95 ms 332.91 ms 334.89 ms]
logical size/old        time:   [343.66 ms 345.34 ms 347.08 ms]

But we need to take this tests with grain of salt. It only measures functions/syscall overhead. Filesystem cache takes most on itself.

(if we want to exclude FS cache from equation, we would need another testing methodology, that would flush caches. It's possible, at least on Linux).

On my local machine I've got following results:

physical size/new       time:   [128.04 ms 128.74 ms 129.44 ms]
physical size/old       time:   [161.32 ms 162.06 ms 162.83 ms]
logical size/new        time:   [130.70 ms 131.47 ms 132.24 ms]
logical size/old        time:   [131.18 ms 131.90 ms 132.64 ms]

Mart-Bogdan · 2022-12-11T15:03:27Z

Hello, @scullionw. Can you please review my PR?

scullionw · 2022-12-14T03:34:43Z

Hi! Thanks for the PR!

Sorry about the delays, quite busy these days. This is a pretty big one and I can't review it all right now.

However, if you want to split the PR (tests, workflows, changes to windows impl), we can merge the first two!

Mart-Bogdan · 2022-12-14T23:11:48Z

Hi.
Hm. I guess it's possible. Would need to split tests for new functionality out.

Could try it till Friday.

P.S. wanted to work on other stuff regarding this app as well.

Mart-Bogdan · 2022-12-16T21:33:57Z

Tests are failing without code. I guess I would not commit in not working tests.

Mart-Bogdan · 2022-12-16T23:06:58Z

I have created PR #10 with tests

…tat-rs

Mart-Bogdan added 13 commits April 23, 2022 03:25

Fix problem with MAX_PATH on windows + UnitTest

20e85a5

* Manually implemented windows File Namespace name generation * added some unit tests closes scullionw#5

Add Check and Lint (fmt/cargo check/clippy)

e0a6df4

add tests workflow

987c40b

Do not cache target dir

f48e6f9

looks like windows is incorrect runner name

8c2e88a

see: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

Added test results to workflow

fb1e8a3

force test to fail

3b956b7

Matrix tests

6cb099a

added rstest to make combination of parameters, and changed parameter…

9a8c477

…s for windows Use COMPRESSION_FORMAT_LZNT1 instead of COMPRESSION_FORMAT_DEFAULT, perhaps it would help on WinServer

It actually worked for win-server. So expanding tests.

ba0b162

Added comment regarding Windows implementation.

f3c6c12

Improved CLI --help description for --apparent flag

d162b0e

Mart-Bogdan force-pushed the max-path branch from 452bbae to d162b0e Compare November 19, 2022 13:29

update imported GH actions version in workflows and dereased project …

22b6a11

…version as it was accidentally increased twice, skiping one version number.

higersky added a commit to higersky/dirstat-rs that referenced this pull request Apr 25, 2024

Use windows-sys to merge pull request scullionw#9 from scullionw/dirs…

ad2c9d7

…tat-rs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework windows "Physical size" calculation, added Unit Tests #9

Rework windows "Physical size" calculation, added Unit Tests #9

Mart-Bogdan commented Nov 19, 2022

Mart-Bogdan commented Nov 19, 2022

Mart-Bogdan commented Nov 19, 2022

Mart-Bogdan commented Dec 11, 2022

scullionw commented Dec 14, 2022

Mart-Bogdan commented Dec 14, 2022

Mart-Bogdan commented Dec 16, 2022

Mart-Bogdan commented Dec 16, 2022

Rework windows "Physical size" calculation, added Unit Tests #9

Are you sure you want to change the base?

Rework windows "Physical size" calculation, added Unit Tests #9

Conversation

Mart-Bogdan commented Nov 19, 2022

Mart-Bogdan commented Nov 19, 2022

Mart-Bogdan commented Nov 19, 2022

Mart-Bogdan commented Dec 11, 2022

scullionw commented Dec 14, 2022

Mart-Bogdan commented Dec 14, 2022

Mart-Bogdan commented Dec 16, 2022

Mart-Bogdan commented Dec 16, 2022