-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inode eviction and sync writeback deadlock on Linux #17159
base: master
Are you sure you want to change the base?
Conversation
If inode eviction and sync writeback happen on the same inode at the same time, inode eviction will set I_FREEING and wait for sync writeback, and sync writeback may eventually calls zfs_get_data and loop in zfs_zget forever because igrab cannot succeed with I_FREEING, thus causing deadlock. To fix this, in zfs_get_data we call a variant of zfs_zget where we bailout on loop if I_SYNC flag is set, and force the caller to wait for txg sync. Signed-off-by: Chunwei Chen <[email protected]> Fixes openzfs#7964 Fixes openzfs#9430
@@ -1149,10 +1149,21 @@ zfs_get_data(void *arg, uint64_t gen, lr_write_t *lr, char *buf, | |||
ASSERT3P(lwb, !=, NULL); | |||
ASSERT3U(size, !=, 0); | |||
|
|||
error = zfs_zget_impl(zfsvfs, object, &zp, B_TRUE); | |||
#if defined(__linux__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added ifdef because I'm not sure if it's possible to return EAGAIN on freebsd or not.
But I guess looking at the caller, it doesn't really matter if we return EAGAIN or EIO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps it would be more clean to enhance zil_lwb_commit
, which is the user of zfs_get_data
, to recognize EAGAIN
, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in the znode code, but I don't see any surface level issues.
@@ -1149,10 +1149,21 @@ zfs_get_data(void *arg, uint64_t gen, lr_write_t *lr, char *buf, | |||
ASSERT3P(lwb, !=, NULL); | |||
ASSERT3U(size, !=, 0); | |||
|
|||
error = zfs_zget_impl(zfsvfs, object, &zp, B_TRUE); | |||
#if defined(__linux__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps it would be more clean to enhance zil_lwb_commit
, which is the user of zfs_get_data
, to recognize EAGAIN
, I think
* path, i.e. zfs_get_data, if inode is being | ||
* evicted and I_SYNC is also set. | ||
*/ | ||
if (check_sync && (ZTOI(zp)->i_state & I_SYNC)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can move this before the igrab
, take the i_lock
and check for I_FREEING
and I_SYNC
and if they are both set, bail out of the original zfs_zget
without the need for zfs_zget_impl
- have you already considered it? am I overlooking something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only place that needs bail out is if you are in writeback context, either you are the one setting I_SYNC or whoever set it is blocked by you.
If you do bailout for other caller then you will get errors randomly in all sorts of places for no good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I_SYNC
is set in two kernel functions, writeback_sb_inodes
, writeback_single_inode
. So if you check for I_SYNC
and I_FREEING
and you use i_lock
correctly as you're supposed to, you get more clean and also reliable solution. How is what am saying wrong specifically and why? Which tons of errors?
If inode eviction and sync writeback happen on the same inode at the same time, inode eviction will set I_FREEING and wait for sync writeback, and sync writeback may eventually calls zfs_get_data and loop in zfs_zget forever because igrab cannot succeed with I_FREEING, thus causing deadlock.
To fix this, in zfs_get_data we call a variant of zfs_zget where we bailout on loop if I_SYNC flag is set, and force the caller to wait for txg sync.
Signed-off-by: Chunwei Chen [email protected]
Fixes #7964
Fixes #9430
Motivation and Context
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.