Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

range_tree: Add zfs_recover_rt parameter and extra debug info #17094

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ihoro
Copy link

@ihoro ihoro commented Feb 25, 2025

Motivation and Context

There are production cases when loading of a metaslab leads to a ZFS panic due to unexpected entries in its spacemap (presumably). The assertions in zfs_range_tree_add_impl() and zfs_range_tree_remove_impl() fail due to overlapping or missing segments, etc. A business would like to go ahead with such pools while the root cause is being investigated.

Description

The idea is to allow loading such metaslabs with a potential space leak as a trade-off instead of a potential data loss.

We already have zfs_recover module parameter to mitigate various issues, including some range tree cases, and this patch adds zfs_recover_ms parameter to localize the recovery behavior to the metaslab loading process only.

The following diagrams are expected to help with the details:

zfs_recover_ms

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@github-actions github-actions bot added the Status: Work in Progress Not yet ready for general review label Feb 25, 2025
@ihoro ihoro changed the title Add zfs_recover_ms parameter range_tree: Add zfs_recover_rt parameter and extra debug info Mar 4, 2025
@ihoro
Copy link
Author

ihoro commented Mar 4, 2025

The update includes:

  • Re-work into zfs_recover_rt parameter instead of zfs_recover_ms. It covers all range trees.
  • Add range tree name as an extra debug info resulting into a message like zfs: rt_instance=vdev_obsolete_segments: ....
  • The metaslab related trees provide even more details like zfs: rt_instance={spa=p1 vdev_guid=4127788562752866619 ms_id=0 ms_allocatable}: ....
  • Update zfs.4 man page respectively

@ihoro ihoro marked this pull request as ready for review March 4, 2025 13:12
@github-actions github-actions bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Mar 4, 2025
@tonyhutter
Copy link
Contributor

Looks like there's a build issue:

    CC       module/zfs/libzpool_la-dsl_bookmark.lo
    CC       module/zfs/libzpool_la-dsl_crypt.lo
  module/zfs/dnode.c: In function 'dnode_free_range':
  module/zfs/dnode.c:2441:59: error: passing argument 7 of 'zfs_range_tree_create_usecase' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
   2441 |                             ZFS_RANGE_TREE_UC_FREE_SPACE, "dn_free_ranges");
        |                                                           ^~~~~~~~~~~~~~~~
  In file included from ./include/sys/space_map.h:34,
                   from ./include/sys/spa.h:46,
                   from ./include/sys/dbuf.h:32,
                   from module/zfs/dnode.c:28:
  ./include/sys/range_tree.h:294:45: note: expected 'char *' but argument is of type 'const char *'
    294 |     zfs_range_tree_usecase_t usecase, char *instance);
        |                                       ~~~~~~^~~~~~~~
    CC       module/zfs/libzpool_la-dsl_dataset.lo
    CC       module/zfs/libzpool_la-dsl_deadlist.lo
  cc1: all warnings being treated as errors
  make[4]: *** [Makefile:10313: module/zfs/libzpool_la-dnode.lo] Error 1

@ihoro
Copy link
Author

ihoro commented Mar 7, 2025

Looks like there's a build issue:

Indeed, there should have been one more dev iteration on my side. It should be fixed now.

@tonyhutter
Copy link
Contributor

For some reason a bunch of the runners are failing. I'm going to manually restart then.

@tonyhutter
Copy link
Contributor

Please rebase on master; that will pull in the new .github/workflow/* files and allow the tests to complete.

@ihoro
Copy link
Author

ihoro commented Mar 11, 2025

For some reason a bunch of the runners are failing. I'm going to manually restart then.

Thank you.

Please rebase on master; that will pull in the new .github/workflow/* files and allow the tests to complete.

Sure, it's been moved 3 commits up, over the recent workflow changes, it should work now.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see plenty of places where use case is not defined. We could look better. Also I think in many cases we could still log pool and vdev, even if we have no metaslab number to report.

There are production cases where unexpected range tree segment
adding/removal leads to panic. The root cause investigation requires
more debug info about the range tree and the segments in question when
it happens. In addition, the zfs_recover_rt parameter allows converting
such panics into warnings with a potential space leak as a trade-off.

Signed-off-by: Igor Ostapenko <[email protected]>
@ihoro
Copy link
Author

ihoro commented Mar 13, 2025

I see plenty of places where use case is not defined. We could look better.

It seems it's time to rename the _UC_UNKNOWN flag to the more appropriate one _UC_GENERIC. It depicts the intention better -- a range tree without special treatment where zfs_recover_rt simply does not panic upon unexpected additions/removal.

The unknown (now generic) flag is intentionally used for the range tree instances where special treatment is not expected. Sometimes it's not about allocated/free space or it's a temporary tree which is based on already "recovered" other ones. Anyway, I think I could review the instances once again, probably someone should not be GENERIC.

Also I think in many cases we could still log pool and vdev, even if we have no metaslab number to report.

Yes, it's worth the extra code to be maximum useful. It will come with the next iteration of the patch.

* name string, which can be marked as dynamic to be freed along with the tree
* instance destruction.
*/
#define ZFS_RANGE_TREE_F_UC_GENERIC (1 << 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "GENERIC" is really meaningful. Easier and cleaner I think would be just to pass 0 if we can't say anything better (we really should).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants