Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inode eviction and sync writeback deadlock on Linux #17159

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tuxoko
Copy link
Contributor

@tuxoko tuxoko commented Mar 19, 2025

If inode eviction and sync writeback happen on the same inode at the same time, inode eviction will set I_FREEING and wait for sync writeback, and sync writeback may eventually calls zfs_get_data and loop in zfs_zget forever because igrab cannot succeed with I_FREEING, thus causing deadlock.

To fix this, in zfs_get_data we call a variant of zfs_zget where we bailout on loop if I_SYNC flag is set, and force the caller to wait for txg sync.

Signed-off-by: Chunwei Chen [email protected]
Fixes #7964
Fixes #9430

Motivation and Context

Description

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

If inode eviction and sync writeback happen on the same inode at the
same time, inode eviction will set I_FREEING and wait for sync
writeback, and sync writeback may eventually calls zfs_get_data and loop
in zfs_zget forever because igrab cannot succeed with I_FREEING, thus
causing deadlock.

To fix this, in zfs_get_data we call a variant of zfs_zget where we
bailout on loop if I_SYNC flag is set, and force the caller to wait for
txg sync.

Signed-off-by: Chunwei Chen <[email protected]>
Fixes openzfs#7964
Fixes openzfs#9430
@@ -1149,10 +1149,21 @@ zfs_get_data(void *arg, uint64_t gen, lr_write_t *lr, char *buf,
ASSERT3P(lwb, !=, NULL);
ASSERT3U(size, !=, 0);

error = zfs_zget_impl(zfsvfs, object, &zp, B_TRUE);
#if defined(__linux__)
Copy link
Contributor Author

@tuxoko tuxoko Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added ifdef because I'm not sure if it's possible to return EAGAIN on freebsd or not.
But I guess looking at the caller, it doesn't really matter if we return EAGAIN or EIO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps it would be more clean to enhance zil_lwb_commit, which is the user of zfs_get_data, to recognize EAGAIN, I think

Copy link
Contributor

@tonyhutter tonyhutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert in the znode code, but I don't see any surface level issues.

@@ -1149,10 +1149,21 @@ zfs_get_data(void *arg, uint64_t gen, lr_write_t *lr, char *buf,
ASSERT3P(lwb, !=, NULL);
ASSERT3U(size, !=, 0);

error = zfs_zget_impl(zfsvfs, object, &zp, B_TRUE);
#if defined(__linux__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps it would be more clean to enhance zil_lwb_commit, which is the user of zfs_get_data, to recognize EAGAIN, I think

* path, i.e. zfs_get_data, if inode is being
* evicted and I_SYNC is also set.
*/
if (check_sync && (ZTOI(zp)->i_state & I_SYNC))
Copy link
Contributor

@snajpa snajpa Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can move this before the igrab, take the i_lock and check for I_FREEING and I_SYNC and if they are both set, bail out of the original zfs_zget without the need for zfs_zget_impl - have you already considered it? am I overlooking something?

Copy link
Contributor Author

@tuxoko tuxoko Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place that needs bail out is if you are in writeback context, either you are the one setting I_SYNC or whoever set it is blocked by you.

If you do bailout for other caller then you will get errors randomly in all sorts of places for no good reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I_SYNC is set in two kernel functions, writeback_sb_inodes, writeback_single_inode. So if you check for I_SYNC and I_FREEING and you use i_lock correctly as you're supposed to, you get more clean and also reliable solution. How is what am saying wrong specifically and why? Which tons of errors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants