`rm -rf` on hardlink deletes original files #34

ORESoftware · 2017-09-10T21:56:14Z

It appears that

rm -rf on hard link deletes original files

this is dangerous and on Linux if you rm the hard link the original file is still intact.

Can you confirm / deny this behavior with your lib on MacOS?

The text was updated successfully, but these errors were encountered:

BenjaminHCCarr · 2017-09-11T16:29:12Z

@ORESoftware this is the intended/desired behavior on UNIX

All hardlinks point to the same inode so the same spot on the disk.
see:
http://www.farhadsaberi.com/linux_freebsd/2010/12/hard-link-soft-symbolic-links.html
https://www.freebsd.org/cgi/man.cgi?query=ln

If your linux distro is deviating from this it is not following the UNIX standard.

Hardlinks and symlinks act differently. If you delete a symlink eg: ln -s $source $target and then rm $target, you will still have $source, but the symlink is just a moveable pointer.

Often with symlinks if you delete $source you will end up with $target symlinks lying around "dead"

So yes, I can confirm deleting a hardlink deletes the inode so thus the original data. This is the desired behavior though.

ORESoftware · 2017-09-11T23:12:53Z

@BenjaminHCCarr @selkhateeb is there a way to 'remove/undo the hardlink' without deleting the original files?

Do you know if hln will work on Linux? Or just MacOS?

mhelvens · 2017-10-06T07:51:45Z

@BenjaminHCCarr: Unix standard? I don't think that's true. I don't know about FreeBSD, but both on my Linux box and my Macbook, deleting one of the references of a hard link (created with ln) leaves the others intact. Deleting the last reference deletes the inode. It uses reference counting.

I'd love to get that functionality here too.

Swivelgames · 2024-01-31T23:03:07Z

This is a bit of a necropost, but I wanted to put this out there since this is coming up in Google searches.

`rm -rf` is working as expected. For links, you want `unlink`

@ORESoftware The idiomatic way would be to use unlink, however I'm not sure if that applies to how this repo achieves hardlinked directories, and there are protections in place that try to prevent you from unlinking directories. It is expected that rm -rf will delete the directory and its contents, by nature of how the command works. -r works by first specifically purging files recursively until the directory is empty, and then deleting the directory from the filesystem.

@mhelvens Not for the contents of a directory, if the directory itself is unlinked, but not recursively. The behavior that @BenjaminHCCarr is describing is exactly correct, actually. And that can be disastrous on a larger scale, which is why hardlinked directories are generally discouraged.

Deep-dive into why Hardlinked Directories are difficult and dangerous

Directories are just files

In Unix, every file or directory is a "hardlink". In fact, in the actual physical data structure in ext-based filesystems, even Directories are just files, and their contents are just a map of filenames to their respective inodes:

foo,143927
bar,127694

This would represent a directory containing two hardlinks: foo and bar. Either foo or bar could be a real file (in the traditional sense) or another directory. In either case, they're treated the same.

The inode contains the metadata for the file including the type of file (like if its a true file or if its a directory), the permissions (or mode), the location of the data blocks on the drive where the contents are stored, and a reference counter that counts how many directories the inode is referenced in.

unlink effectively just deletes the entry from the directory its in and decrements the reference counter (in fact, when trying to find references after writing this, I found that this is explicitly how IBM describes the unlink command) So:

unlink foo

Would result in:

- foo,143927
  bar,127694

The process then checks to see if the reference counter is 0 for that specific inode (in this example 143927). If it is 0, then we can assume that no other directories are pointing to it, and then the blocks on the drive that it points to are freed for use by new files.

Why Hardlinked Directories are so difficult

So, if we have a hardlinked directory, we don't want to recursively it and its subfiles. We simply want to remove the pointer to that directory in that particular location.

In fact, one of the reasons why Hardlinked Directories are avoided is because of the potential for lost space that it introduces. For instance, theoretically, we could unlink the last pointer to a directory. That space on the drive that contains the list of filename to inode references itself would be "freed", but all of the files within the directory might be stuck on the drive forever and never freed.

`-r` to the rescue

This is why we have -r for rm. Because, in order to avoid the headaches described above, we need to explicitly delete each individual file before we unlink the directory itself. In fact, unlink doesn't work on directories, but only because the command itself is very simple and isn't built for that recursion, so it explicitly forbids it. That extra code would make the process of inefficient, and dangerous, especially if it's not something that we explicitly wanted to do.

Otherwise, the filesystem might not realize that those files are no longer being referenced by the directory, because we didn't actually touch those files. We didn't explicitly delete the individual references to them, we just deleted the last list that contained all of those references. With that gone for good, those inodes are orphaned forever without a host directory.

That would be a nightmare, and our large drive would quickly run out of space, and there'd be no telling why.

Even though people do, we shouldn't even `rm` a softlinked directory

The data loss @ORESoftware experienced is actually one of the reasons it is recommended to avoid using rm on softlinked directories in general, and to use unlink on them instead. Getting into the practice of using rm on directory links is dangerous. Instead, rm is a more destructive and capable version of unlink that we only want to use if we explicitly want to purge a directory's contents from the drive.

When a directory is unlinked, the references inside of that directory aren't checked. The only requirement is that the directory is empty, because doing this type of recursive checking would be way too performance intensive. So, instead, unlink simply denies you from being able to unlink something if its a directory. And rm denies you from unlinking a directory without -r if it isn't empty.

This is explicitly to prevent orphaned inodes. This isn't completely unavoidable, but that's why we have fsck. But imagine having to run fsck on an in-use filesystem any time you deleted a directory.

That's actually the origin of lost+found. Dangling/orphaned inodes are put in lost+found if their hardlink was destroyed, but their inode and data wasn't cleaned up. Instead of purging the inode and its data, the assumption is that whatever happened wasn't supposed to, so fsck creates a hardlink to the inode throws it in lost+found so that it can either be recovered or permanently destroyed with rm -rf.

Path forward

The only realistic path forward for this would be a custom wrapper for unlink that performs the white-glove checks to make sure that things are in order:

Check to see if there are other directories pointing to the inode by looking at the reference counter
If there are, its safe to remove the hardlink we asked it to
Otherwise, throw an error, saying that its the last link to the directory, so rm -rf must be used instead.
Also, if the target isn't a directory at all, we just run the built-in unlink on it, so we don't change the way it works.

ORESoftware changed the title ~~rm -rf on hardlink deletes original files~~ rm -rf on hardlink deletes original files Sep 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`rm -rf` on hardlink deletes original files #34

`rm -rf` on hardlink deletes original files #34

ORESoftware commented Sep 10, 2017 •

edited

Loading

BenjaminHCCarr commented Sep 11, 2017

ORESoftware commented Sep 11, 2017 •

edited

Loading

mhelvens commented Oct 6, 2017 •

edited

Loading

Swivelgames commented Jan 31, 2024 •

edited

Loading

Directories are just files

Why Hardlinked Directories are so difficult

`-r` to the rescue

Even though people do, we shouldn't even `rm` a softlinked directory

Further Reading

rm -rf on hardlink deletes original files #34

rm -rf on hardlink deletes original files #34

Comments

ORESoftware commented Sep 10, 2017 • edited Loading

BenjaminHCCarr commented Sep 11, 2017

ORESoftware commented Sep 11, 2017 • edited Loading

mhelvens commented Oct 6, 2017 • edited Loading

Swivelgames commented Jan 31, 2024 • edited Loading

rm -rf is working as expected. For links, you want unlink

Directories are just files

Why Hardlinked Directories are so difficult

-r to the rescue

Even though people do, we shouldn't even rm a softlinked directory

Further Reading

Path forward

`rm -rf` on hardlink deletes original files #34

`rm -rf` on hardlink deletes original files #34

ORESoftware commented Sep 10, 2017 •

edited

Loading

ORESoftware commented Sep 11, 2017 •

edited

Loading

mhelvens commented Oct 6, 2017 •

edited

Loading

Swivelgames commented Jan 31, 2024 •

edited

Loading

`rm -rf` is working as expected. For links, you want `unlink`

`-r` to the rescue

Even though people do, we shouldn't even `rm` a softlinked directory