Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory allocation issue when running on windows #457

Open
hmarko opened this issue Dec 31, 2024 · 10 comments
Open

Memory allocation issue when running on windows #457

hmarko opened this issue Dec 31, 2024 · 10 comments

Comments

@hmarko
Copy link

hmarko commented Dec 31, 2024

Hi.

I'm trying to count file on 30M files dataset on SMB.
Anything can be done to overcome it, or I reached the maximum scale of dust ?
Thanks !

.\dust -F -j -r -d 4 -n 100 -s 400000 -f \\server\share$\Groups
Indexing: \\server\share$\Groups 9949070 files, 9.5M ... /memory allocation of 262144 bytes failed

@hmarko
Copy link
Author

hmarko commented Jan 1, 2025

Just an update .. running against the same repository from Linux client completes successfully.
I suspect this is an issue which is relevant only to windows version

@bootandy
Copy link
Owner

Can you try running dust with more memory: eg: dust -S 1073741824 -S lets you specify stack size so you can try increasing / decreasing the number and see if windows sorts itself out.

@hmarko
Copy link
Author

hmarko commented Jan 20, 2025

C:\DUST>C:\DUST\dust.exe -S 1073741824 -D -p -j -r -f -n 100 -d 7 -z 200000 "\\srv\c$\folder"
Indexing: \\srv\c$\folder 12401021 files, 11M ... \memory allocation of 262144 bytes failed

@bootandy
Copy link
Owner

I'm not sure I can do anything here. If windows is failing to assign enough memory to run dust, I'm not sure if there is anything I can do.

I'd recommend repeatedly halving the number in -S and then repeatedly doubling it and seeing if you can get a good run.

@hmarko
Copy link
Author

hmarko commented Jan 25, 2025

I see the same also on linux on file systems with many million files.

I will try to play with the -S but as far as I can see it is a general scalability issue.

BTW, did you try it on file systems with 20-30 million files or more?

@bootandy
Copy link
Owner

The same on linux ? Ok, let me try and recreate on linux.

Using these 2 scripts I made a large number of files on my ext4 filesystem:

cat ~/temp/many_files/make.sh 
#! /bin/bash
for n in {1..1000}; do
    dd if=/dev/urandom of=file$( printf %03d "$n" ).bin bs=1 count=$(( RANDOM + 1024 ))
done


cat ~/temp/many_files/silly4/make.sh 
#! /bin/bash
for n in {1..1000}; do
	mkdir $n
	touch $n/bspl{00001..09009}.$n
done


Gives:

 (collapse)andy:(0):~/dev/rust/dust$ dust -f ~/temp/ -n 10
    99,003     ┌── many_small │█░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   0%
   599,419     ├── many_small2│██░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   1%
   900,982     ├── silly2     │███░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   2%
   999,031     ├── silly      │████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   2%
 2,232,767     ├── silly3     │████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │   5%
 9,009,001     ├── silly4     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
 9,009,001     ├── silly5     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
 9,009,001     ├── silly6     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
 9,009,001     ├── silly7     │██████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │  22%
40,887,211   ┌─┴ many_files   │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100%
40,887,212 ┌─┴ temp           │████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ │ 100%
 (collapse)andy:(0):~/dev/rust/dust$ 

@bootandy
Copy link
Owner

I think by the time you are getting up to tracking a few tens of million files you are pushing the memory limits of your average system. HTOP certainly wasn't very happy when I ran the above ^

@hmarko
Copy link
Author

hmarko commented Jan 26, 2025

I ran an identical command to what you did and It worked.
In my use case there are a few differences that may related:

  1. I use SMB or NFS to access the fs over the network
  2. My directory structure is more complex (can get deep and narrow)
  3. There are long directories and file names

Anyway, the servers I use have 32G of RAM and are doing nothing else.
Is there any way I can use it to debug?

Thanks!

@bootandy
Copy link
Owner

I'm not sure I can offer much more.

adding '-d' doesn't make it useless memory.

I can only suggest cd-ing into a subdirectory so it has less data to trawl through.

@hmarko
Copy link
Author

hmarko commented Jan 27, 2025

Thanks !

I will learn some rust and run some debugs myself.

I will let you know if something pops

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants