Skip to content

Detect a slow raidz child during reads #17227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

robn
Copy link
Member

@robn robn commented Apr 8, 2025

Motivation and Context

Replacing #16900, which was almost finished with review updates but has stalled. I've been asked to take it over.

See original PR for details.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

A single slow responding disk can affect the overall read
performance of a raidz group.  When a raidz child disk is
determined to be a persistent slow outlier, then have it
sit out during reads for a period of time. The raidz group
can use parity to reconstruct the data that was skipped.

Each time a slow disk is placed into a sit out period, its
`vdev_stat.vs_slow_ios count` is incremented and a zevent
class `ereport.fs.zfs.delay` is posted.

The length of the sit out period can be changed using the
`raid_read_sit_out_secs` module parameter.  Setting it to
zero disables slow outlier detection.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Don Brady <[email protected]>
also added test from Tony

Signed-off-by: Don Brady <[email protected]>
@robn
Copy link
Member Author

robn commented Apr 8, 2025

Because there will be a little bouncing between two PRs, and because there's two different authors involved, I'll be pushing fixup commits to this branch. Once everyone is happy with review, I will squash them down for merge.

I'll close out the remaining review comments on #16900, and would like it if new comments could be added here. Thanks all for your patience; I know its a bit fiddly (it'd be nicer if Github would allow a PR to change branches, alas).

don-brady and others added 4 commits April 8, 2025 11:53
also changed
'raidz_read_sit_out_secs' to 'vdev_read_sit_out_secs'

Signed-off-by: Don Brady <[email protected]>
Review feedback.

Signed-off-by: Rob Norris <[email protected]>
@robn robn force-pushed the raidz-detect-slow-outlier branch from 905c466 to 4a1a3d3 Compare April 8, 2025 04:15
@amotin amotin added the Status: Code Review Needed Ready for review and testing label Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants