Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle deleted files on Linux #444

Open
tatref opened this issue Oct 14, 2024 · 4 comments
Open

Handle deleted files on Linux #444

tatref opened this issue Oct 14, 2024 · 4 comments

Comments

@tatref
Copy link

tatref commented Oct 14, 2024

Hi,

On Linux, if a file is deleted while a process still has a handle on it, the disk space is still used, but not visible on the FS (ls, find.. will not find it). This is a common sysadmin issue, so I think it would be great to add an option to search for deleted files

The only way to know that a file is still taking up space, is by walking the file descriptors under /proc/$pid/fd/, and checking if the files still exist.
We can use lsof to show the deleted files:

[root@enterprise ~]# lsof -n | grep deleted
httpd    2357 apache   29u   REG 253,17 3926560     0  1499 /tmp/.NSPR-AFM-3457-9820130.0 (deleted)
mysqld   2588  mysql    4u   REG 253,17      52     0  1495 /tmp/ibY0cXCd (deleted)
mysqld   2588  mysql    5u   REG 253,17    1048     0  1496 /tmp/ibOrELhG (deleted)

Do you think it would be a worthy feature to add to dust?

@bootandy
Copy link
Owner

That is an interesting edge case.

I can imagine this being a common problem for sysadmins.

dust works by walking thru the filesystem, if ls won't find it I don't think dust will.

I'm currently not keen on adding this feature. Like you said, it would require walking the file descriptors, checking if that path matched the path dust was run for, then working out if it had already been included. Then if I were to show it I'd need a way of marking it as 'different' because it wouldn't be removed if you 'rm' the file.

I think we are probably better served with lsof -n | grep deleted

@tatref
Copy link
Author

tatref commented Oct 18, 2024

Yes you are correct, dust or ls can't find the file with a syscall in the filesystem, the only way is through /proc/

The workflow you describe is what I imagined. Yes rm can't delete the file, but the space is still used on the FS. We could maybe add a flag --list-deleted or something, then display the file deleted files as others. Or don't add a flag, and list the file with a different color/pattern

The thing is, using lsof can be complicated: the size shown is not the used space on the FS (it does not take into account sparse files), same file can be listed multiple time... Also there is no easy way of grouping the files by dir, as with dust.

I can help to make a demo implementation if you want

@bootandy
Copy link
Owner

If you were to have this --list-deleted flag. Do you think you would want it to be merged in with the regular files. Because I'm wondering if it should ONLY show the deleted files.

If you are hunting down lost disk space your procedure for actual files in the filesystem is going to be different than for processes that are holding on to deleted files.

So I'm proposing this:

dust --show-deleted 
./file_still_used_by_proc
./other_file_still_used_by_proc

<does not show files on the filesystem only deleted files>

What do you think about this ?

@tatref
Copy link
Author

tatref commented Oct 23, 2024

Hi,

I think it's better to merge the deleted files with the regular files, because in the end, both are taking up space.

The name of the deleted files are suffixed by the kernel with (deleted), so if theses files are visible in the output (not too deep in the tree), they will be visible like so:

100M   ┌── img.dd (deleted)                                        │████████████████████                                                                                                                            │  14%
 25M   │         ┌── s-h0x1gqqby2-1mdf3k6-24jxlf5lag4v0vlmktn2jvjdd│█████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓                                                                                                                    │   3%
 25M   │       ┌─┴ procfs-2f2m88sfnv9m0                            │█████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓                                                                                                                    │   3%
 34M   │       │ ┌── s-h0x19zmz8c-1btx6ww-8dz0hhsxz6pck6g30opy722lu│███████▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓                                                                                                                    │   5%

If you want to test, you can do the following:

# in 1st terminal
dd if=/dev/zero of=img.dd bs=1M count=100     # create a 100 MB file
less img.dd                                   # open the file (type y to confirm), and keep the terminal open

# in 2nd terminal
rm img.dd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants